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Safety-Critical  Software: 

Status  Report  and  Annotated  Bibliography 


Abstract:  Many  systems  are  deemed  safety-critical  and  these  systems  are 
increasingly  dependent  on  software.  Much  has  been  written  in  the  literature 
with  respect  to  system  and  software  safety.  This  report  sumina::  ,es  some  of 
that  literature  and  outlines  the  development  of  safe*,  critical  software. 
Techniques  for  hazard  identification  and  analysis  are  discussed.  Further, 
techniques  for  the  development  of  safety-critical  software  are  mentioned.  A 
partly  annotated  bibliography  of  literature  concludes  the  report. 


1  Introduction 

This  chapter  discusses  the  reasons  for  writing  this  report  and  the  role  of  safety-critical  software 
in  requirements  engineering.  Some  background  material  suggesting  reasons  for  the  current 
increase  in  interest  In  safety-critical  software  is  presented. 

1 .1  Purpose  of  This  Report 

The  purpose  of  the  report  is  to  bring  together  concepts  necessary  for  the  development  of  soft¬ 
ware  in  safety-critical  systems.  An  annotated  bibliography  may  be  used  as  a  reference  base 
for  further  study. 

Although  this  report  was  produced  by  members  of  the  requirements  engineering  project  it  cov¬ 
ers  aspects  of  software  development  outside  the  restricted  area  of  requirements  engineering. 
This  is  due,  in  part,  to  the  nature  of  the  literature  surveyed,  which  discusses  all  aspects  of  soft¬ 
ware  development  for  software  in  safety-critical  systems.  Also,  the  project  members  take  the 
view  that  specification  and  analysis  are  part  of  the  requirements  engineering  process  and  are 
activities  performed  as  soon  as  system  requirements  have  been  elicited  from  the  appropriate 
sources. 

The  report  is  not  intended  as  a  tutorial  on  any  specific  technique,  though  some  techniques  are 
highlighted  and  discussed  briefly.  Interested  readers  should  turn  to  appropriate  literature  for 
more  detailed  information  on  the  use  of  the  techniques  described  herein.  There  has  been  a 
great  deal  of  recent  activity  in  the  application  of  formal  methods  to  safety-critical  software  de¬ 
velopment  and  we  will  outline,  later  in  this  report,  the  classes  of  formal  method  and  how  they 
may  be  used.  We  do  not  concentrate  on  specific  methods  since  a  method  should  be  chosen 
to  match  the  system  under  construction.  Instead,  we  discuss  options  where  the  developers 
may  choose  one  type  of  method  over  another. 
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1.2  Requirements  Engineering  and  Safety 

Standards  exist  that  state  that  all  safety-critical  components  of  a  system  must  be  developed 
in  a  particular  way.  Given  that  the  required  development  techniques  may  be  more  costly  than 
current  techniques,  or  be  within  the  capabilities  of  a  limited  number  of  the  staff,  it  is  important 
to  minimize  the  proportion  of  the  system  that  has  to  be  developed  according  to  the  safety  stan¬ 
dard.  The  requirements  engineer  has  the  opportunity  to  manipulate  the  requirements  to  mini¬ 
mize  the  safety-critical  subsystems  while  maintaining  an  overall  required  level  of  safety  for  the 
entire  system.  Generally,  a  well-designed  system  will  have  few  safety-critical  components  in 
proportion  to  the  total  system.  However,  these  components  may  prove  to  be  some  of  the  hard¬ 
est  components  to  develop  since  their  design  and  development  requires  a  system-level  rather 
than  a  component-level  understanding. 

It  is  clear  that  safety  must  be  considered  from  the  start  in  the  development  of  a  system.  This 
means  considering  issues  of  safety  at  the  concept  exploration  phase,  the  demonstration  and 
validation  phase,  and  the  full  scale  development  phase.  Safety  concerns  often  conflict  with 
other  development  concerns  such  as  performance  or  cost.  Decisions  should  not  be  made  dur¬ 
ing  development  for  reasons  of  performance  or  cost  that  compromise  safety  without  perform¬ 
ing  an  analysis  of  the  risk  associated  with  the  resultant  system.  The  safety  of  a  system  is 
considered  by  understanding  the  potential  hazards  of  the  system,  that  is,  the  potential  acci¬ 
dents  that  the  system  may  cause.  Once  the  hazards  are  understood,  the  system  may  be  an¬ 
alyzed  in  terms  of  the  safety  hazards  of  the  components  of  the  system,  and  each  component 
may  be  analyzed  in  the  same  way,  leading  to  a  hierarchy  of  safety  specifications. 

The  development  of  the  requirements  specification  is  a  part  of  the  requirements  engineering 
phase;  indeed,  the  product  of  requirements  engineering  should  be  the  specification  for  use  by 
the  developers.  During  the  requirements  engineering  phase,  design  decisions  are  made  con¬ 
cerning  the  allocation  of  function  to  system  components;  it  is  at  this  stage  that  decisions  con¬ 
cerning  overall  system  safety  must  be  made.  The  specification  acts  as  the  basis  for  both 
development  and  testing. 

An  important  objective  of  requirements  engineering  is  the  elimination  of  errors  in  the  require¬ 
ments.  These  errors  typically  occur  in  two  forms:  misunderstanding  customer  desires  or  poorly 
conceived  customer  requests,  The  implication  of  this  is  that  the  requirements  engineering  pro¬ 
cess  must  analyze  the  requirements  for  both  desirable  and  undesirable  behaviors. 

Safety  is  a  system-level  issue  and  cannot  be  determined  by  examining  the  safety  of  the  com¬ 
ponents  in  isolation.  The  approach  taken  is  to  develop  a  system  model  which  represents  a 
safe  system;  if  not,  the  system  will  never  be  safe  since  the  model  is  used  as  the  basis  tor  anal¬ 
ysis  and  further  development,  The  developers  are  led  into  developing  components  of  the  sys¬ 
tem  in  isolation  and  the  system  integrators  put  these  components  together.  Although  each  of 
the  individual  components  may  be  safe,  the  integrated  system  may  not  be  safe  and  may  well 
be  untestable  for  safety  given  the  infeasibility  of  generating  sufficient  test  cases  for  a  reliable 
and  safe  system. 
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Many  systems  cannot  be  feasibly  tested  in  a  live  situation.  For  example,  systems  such  as  nu¬ 
clear  power  plant  shutdown  systems,  aircraft  flight  control  systems,  or  critical  components  of 
strategic  weapons  systems  cannot  be  adequately  tested  because  it  would  be  necessary  to 
create  a  hazardous  situation  in  which  failure  would  be  disastrous. 

Customers'  requirements  are  usually  presented  in  many  forms,  examples  being  natural  lan¬ 
guage  descriptions,  engineering  diagrams  and  mathematics.  In  order  to  engineer  a  safe  sys¬ 
tem,  it  is  generally  the  case  that  each  customer's  requirements  be  organized  into  a  coherent 
form  that  may  be  analyzed  in  a  cost-effective  manner. 

Formal  specification  techniques  provide  notations  appropriate  for  the  specification  and  analy¬ 
sis  of  systems  or  software  that  cannot  be  tested  in  a  live  situation.  These  techniques  provide 
notations  that  may  be  used  to  model  the  customer’s  desires  [186].  instead  of  relying  on  poten¬ 
tially  ambiguous  natural  language  statements,  the  specifications  describe  the  system  using 
mathematics  with  only  one  possible  interpretation,  which  may  be  analyzed  for  defects.  When 
completed,  the  formal  specification  forms  a  model  of  the  system  and  may  be  used  to  predict 
the  behavior  of  that  system  under  any  given  set  of  circumstances.  Thus,  the  safety  of  the  sys¬ 
tem  may  be  estimated  by  using  the  model  to  predict  how  the  system  will  react  to  a  given  se¬ 
quence  of  potentially  hazardous  events.  If  the  model  behaves  according  to  the  customer's 
notions  of  safety,  then  we  can  have  confidence  that  a  system  conforming  to  the  specifications 
will  be  safe. 

1.3  Background 

The  use  of  software  is  increasing  in  safety-critical  components  of  systems  being  developed 
and  delivered.  Examples  of  systems  using  software  in  place  of  hardware  in  safety-critical  sys¬ 
tems  are  the  Therac  25  (a  therapeutic  linear  accelerator)  and  nuclear  reactor  shutdown  sys¬ 
tems  (Darlington,  Ontario,  is  the  best  publicized  example).  There  are  many  other  instances  of 
introduction  of  software  into  safety-critical  systems. 

In  many  cases  the  new  software  components  are  replacing  existing  hardware  components. 
The  introduction  of  software  into  such  systems  introduces  new  modes  of  failure  for  the  sys¬ 
tems  which  cannot  be  analyzed  by  the  traditional  engineering  techniques.  This  is  because 
software  fails  differently  from  hardware;  software  failure  is  less  predictable  than  hardware  fail¬ 
ure. 


1 .4  Structure  of  the  Report 

The  report  collects  a  number  of  topics  relating  to  requirements  engineering  and  the  subse¬ 
quent  development  of  systems  with  safety-critical  components.  Chapter  2  is  a  collection  of 
themes  that  recur  throughout  the  literature  with  some  commentary  on  each  theme.  Chapter  3 
describes  the  various  techniques  used  to  determine  which  parts  of  a  system  are  safety  critical 
and  which  are  not.  Chapter  4  discusses  development  techniques  applicable  to  the  develop¬ 
ment  of  safety-critical  systems  throughout  major  phases  of  implementation.  Chapter  5  de- 
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scribes  a  number  of  current  standards  pertaining  to  the  development  of  software  for  safety- 
critical  systems.  This  chapter  also  discusses  some  concerns  on  the  usefulness  of  standards 
and  some  harmful  effects  that  a  standard  may  create.  Chapter  6  discusses  the  conclusions 
drawn  while  writing  this  report.  This  chapter  also  discusses  potential  avenues  for  further  work 
in  requirements  engineering  and  development  of  software  in  safety-critical  systems.  The  re¬ 
port  concludes  with  an  annotated  bibliography  of  papers  and  books  relating  to  safety-critical 
systems, 
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2  Comments  on  Software  Safety 


This  chapter  collects  a  number  of  the  concepts  relating  to  safety-critical  software  that  may  be 
found  in  various  journals  and  books:  an  extensive  bibliography  may  be  found  at  the  end  of  this 
report.  Each  section  presents  a  different  concept  and  some  discussion  of  that  concept. 


2.1  Safety  Is  a  System  Issue 

Leveson  [1 17]  and  others  make  the  point  that  safety  is  not  a  software  issue:  rather,  it  is  a  sys¬ 
tem  issue.  By  itself,  software  does  nothing  unsafe.  It  is  the  control  of  systems  with  hazardous 
components  or  the  providing  of  information  to  people  who  make  decisions  that  have  potentially 
hazardous  consequences  that  leads  to  hazardous  systems.  Thus,  software  can  be  considered 
unsafe  only  in  the  context  of  a  particular  system. 

At  the  system  level,  software  may  be  treated  as  one  or  more  components  whose  failure  may 
lead  to  a  hazardous  system  condition,  Such  a  condition  may  result  in  the  occurrence  of  an  ac¬ 
cident. 


2.2  Safety  Is  Measured  as  Risk 

Safety  is  an  abstract  concept.  We  inherently  understand  what  we  mean  when  we  say,  “This 
system  is  safe."  Essentially,  we  mean  that  it  will  not  cause  harm  either  to  people  or  property. 
However,  this  notion  is  too  simple  to  be  useful  as  a  statement  of  safety.  There  are  many  sys¬ 
tems  that  can  be  made  completely  safe,  but  making  systems  that  safe  may  interfere  with  their 
ability  to  perform  their  intended  function.  An  example  would  be  a  nuclear  reactor — the  system 
is  perfectly  safe,  so  long  as  no  nuclear  material  is  introduced  into  the  system.  Such  a  system 
is,  of  course,  not  useful.  Thus,  the  definition  of  safety  becomes  related  to  risk.  Risk  may  be 
defined  as 

Risk  =  X  £  {hazard.)  xP  {hazard) 

hazard 


where  z(hazard)  is  a  measure  of  the  effects  that  may  be  caused  by  a  particular  mishap  and 
P(hazard)  is  the  probability  that  the  mishap  will  occur. 

We  will  not  further  define  how  risk  may  be  measured.  Examples  of  appropriate  measures 
would  be  in  terms  of  either  human  life  or  replacement  or  litigation  costs.  There  are  many  other 
measures  that  may  be  chosen  to  assess  risk.  However,  the  point  we  must  accept  is  that  no 
system  will  be  wholly  safe.  Instead,  we  must  attempt  to  minimize  the  risk  by  either  containing 
the  hazard  or  reducing  the  probability  that  the  hazard  will  occur. 
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2.3  Reliability  Is  Not  Safety 

It  is  important  10  distinguish  between  the  terms  reliability  and  safety.  According  to  definitions 
from  Deutsch  and  Willis  [55],  reliability  is  a  measure  of  the  rate  of  failure  in  the  system  that 
renders  the  system  unusable,  and  safety  is  a  measure  of  the  absence  of  unsafe  software  con¬ 
ditions.  Thus,  reliability  encompasses  issues  such  as  the  system’s  correctness  with  regard  to 
its  specification  (assuming  a  specification  that  describes  a  usable  system)  and  the  ability  of 
the  system  to  tolerate  faults  in  components  of  or  inputs  to  the  system  (whether  these  faults  are 
transient  or  permanent).  Safety  is  described  in  terms  of  the  absence  of  hazardous  behaviors 
in  the  system. 

As  can  be  seen,  reliability  and  safety  are  different  system  concepts:  the  former  describes  how 
well  the  system  performs  its  function  and  the  latter  states  that  the  system  functions  do  not  lead 
to  an  accident.  A  system  may  be  reliable  but  unsafe.  An  example  of  such  a  system  is  an  air¬ 
craft  avionics  system  which  continues  to  operate  under  adverse  conditions  such  as  compo¬ 
nent  failure,  yet  directs  a  pilot  to  fly  the  aircraft  on  a  collision  course  with  another  aircraft.  The 
system  itself  may  be  reliable;  its  operation,  however,  leads  to  an  accident.  The  system  would 
be  considered  safe  (in  this  case)  if,  on  detecting  the  collision  course,  a  new  course  was  calcu¬ 
lated  to  avoid  the  other  aircraft.  Similarly,  a  system  may  be  safe  but  unreliable.  For  example, 
a  railroad  signalling  system  may  be  wholly  unreliable  but  safe  if  it  always  fails  in  the  most  re¬ 
strictive  way;  in  other  words,  whenever  it  fails  it  shows  “stop."  In  this  case,  the  system  is  safe 
even  though  it  is  not  reliable. 

2.4  Software  Need  Not  Be  Perfect 

A  common  theme  running  through  the  literature  is  that  software  need  not  be  perfect  to  be  safe. 
In  order  to  make  some  sense  of  this  view,  we  need  to  understand  what  is  meant  by  perfection. 
Typically,  we  consider  software  to  be  perfect  if  it  contains  no  errors,  where  an  error  is  a  vari¬ 
ance  between  the  operation  of  the  software  and  the  user’s  concept  of  how  the  software  should 
operate.  (We  use  the  term  “user”  here  to  mean  either  the  operator  or  designer  or  procurer  of 
the  software.)  This  notion  of  perfection  considers  all  errors  equal;  thus,  any  error  (from  a  spell¬ 
ing  mistake  in  a  message  to  the  operator  to  a  gross  divergence  between  actual  and  Intended 
function)  means  that  the  software  is  imperfect. 

However,  from  a  safety  viewpoint,  only  errors  that  cause  the  system  to  participate  in  an  acci¬ 
dent  are  of  importance.  There  may  be  gross  functional  divergence  within  some  parts  of  the 
system,  but  if  these  are  masked,  or  ignored  by  the  safety  components,  the  system  could  still 
be  safe.  As  an  example,  consider  a  nuclear  power  plant  using  both  control  room  software  and 
protection  software.  The  control  room  software  could,  potentially,  contain  many  errors,  but  as 
long  as  the  protection  system  operates,  the  plant  will  be  safe.  It  may  not  be  economical,  it  may 
never  produce  any  power,  but  it  will  not  be  an  agent  in  an  accident.  Even  within  a  system  such 
as  the  protection  system,  some  bugs  can  be  tolerated  from  the  strictly  safety  viewpoint.  For 
example,  the  protection  system  might  always  attempt  to  shutdown  the  reactor,  regardless  of 
the  condition  of  the  reactor.  The  system  is  not  useful,  it  contains  gross  functional  divergence, 
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yet  it  is  safe.  This  should  be  contrasted  with  a  protection  system  that  never  attempts  to  shut 
down  the  reactor  regardless  of  reactor  condition.  This  system  also  contains  gross  functional 
divergence  and  is  unsafe. 

The  view  that  software  need  not  be  perfect  to  ensure  safety  of  the  entire  system  means  that 
developers  and  analysts  of  safe  software  can  concentrate  their  most  detailed  scrutiny  on  the 
safety  conditions  and  not  on  the  operational  requirements.  Indeed,  it  is  commonly  assumed 
that  other  parts  of  the  system  are  imperfect  and  may  not  behave  as  expected. 

2.5  Sate  Software  Is  Secure  and  Reliable 

We  have  already  discussed  the  differences  between  safety  and  reliability,  but  it  should  be 
clear  that  there  are  also  distinct  differences  between  safety  and  security.  Safety  does  depend 
on  security  and  reliability.  Neumann  discusses  hierarchical  system  construction  for  reliability, 
safety,  and  security  [169].  He  also  describes  a  hierarchy  among  these  concepts.  Essentially, 
security  depends  on  reliability  and  safety  depends  on  security  (hence  also  reliability). 

A  secure  system  may  need  to  be  reliable  for  the  following  reason.  If  the  system  is  unreliable, 
it  is  possible  that  a  failure  could  occur  such  that  the  system's  security  is  compromised.  When 
determining  whether  a  system  is  secure,  the  analyst  makes  assumptions  about  atomicity  op¬ 
erations.  If  it  is  possible  for  the  system  to  fail  at  any  point,  then  the  atomicity  assumption  may 
no  longer  hold  and  the  security  analysis  of  the  system  will  be  invalidated.  Of  course,  it  Is  pos¬ 
sible  for  very  carefully  designed  systems  to  be  secure  and  unreliable,  though  the  analysis  for 
such  systems  will  be  harder  than  the  analysis  for  reliable  systems. 

The  safety  critical  components  of  a  system  need  to  be  secure  since  it  is  important  that  the  soft¬ 
ware  and  data  cannot  be  altered  by  external  agents  (software  or  human).  If  the  data  or  soft¬ 
ware  can  be  altered,  then  the  executing  components  will  no  longer  match  those  that  were 
analyzed  and  shown  to  be  safe;  thus,  we  can  no  longer  rely  on  the  safety  critical  components 
to  perform  their  function.  This  may,  in  turn,  compromise  system  safety. 

It  is  obvious  that,  for  some  systems,  safety  depends  on  reliability.  Such  systems  require  the 
software  to  be  operational  to  prevent  mishaps:  in  other  cases,  it  is  possible  to  build  systems 
where  a  failure  of  the  software  still  leads  to  a  safe  system.  In  the  case  of  non  fail-safe  software, 
if  the  safety  system  software  is  unreliable  then  it  could  fail  to  perform  at  any  time,  including  the 
time  when  the  software  is  needed  to  avoid  a  mishap. 

2.6  Software  Should  Not  Replace  Hardware 

One  of  the  advantages  of  software  is  that  it  is  flexible  and  relatively  easy  to  modify.  An  eco¬ 
nomic  advantage  of  software  is  that  once  it  has  been  developed,  the  reproduction  costs  are 
very  low.  Hardware,  on  the  other  hand,  may  be  quite  expensive  to  reproduce  and  is,  in  terms 
of  production  costs,  the  most  expensive  part  of  a  system.  (For  development  costs,  current  wis¬ 
dom  indicates  that  the  reverse  is  true,  that  the  software  development  cost  outweighs  the  hard- 
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ware  development  cost.)  Thus,  from  an  economic  viewpoint,  there  is  considerable  temptation 
to  replace  hardware  components  of  a  system  with  software  analogs.  However,  there  is  a  dan¬ 
ger  to  this  approach  that  leads  to  unsafe  systems. 

Hardware  obeys  certain  physical  laws  that  may  make  certain  unsafe  behaviors  impossible.  For 
example,  if  a  switch  requires  two  keys  to  be  inserted  before  the  switch  can  be  operated,  then 
both  keys  must  be  present  before  the  switch  can  be  operated.  A  software  analog  of  this  system 
could  be  created  and  indeed,  with  a  relatively  simple  system,  we  may  be  able  to  convince  our¬ 
selves  of  its  correctness.  However,  as  the  software  analogs  become  more  complex,  the  likeli¬ 
hood  of  a  possible  failure  increases  and  the  software  may  fail  permitting  (in  the  case  of  our 
example)  the  software  analog  switch  to  be  operated  without  either  of  the  key-holders  being 
present. 

A  concrete  example  of  this  behavior,  taken  from  Leveson  and  Turner  [141],  is  the  Therac  25 
radiation  treatment  machine.  A  predecessor  to  the  Therac  25,  the  Therac  20,  had  a  number 
of  hardware  interlocks  to  stop  an  undesirable  behavior.  Much  of  the  software  in  the  Therac  25 
was  similar  to  that  of  the  Therac  20  and  the  software  in  both  cases  contained  faults  that  could 
be  triggered  in  certain  circumstances.  The  Therac  25  did  not  have  the  hardware  interlocks  and 
where  the  Therac  20  occasionally  blew  fuses,  the  Therac  25  fatally  irradiated  a  number  of  pa¬ 
tients. 

Furthermore,  hardware  fails  in  more  predictable  ways  than  software,  and  a  failure  may  be  fore¬ 
seen  by  examining  the  hardware — a  bar  may  bend  or  show  cracks  before  it  fails.  These  indi¬ 
cators  of  failure  may  occur  long  enough  before  the  failure  that  the  component  may  be  replaced 
before  a  failure  leading  to  a  mishap  occurs.  Software,  on  the  other  hand,  does  not  exhibit  phys¬ 
ical  characteristics  that  may  be  observed  in  the  same  way  as  hardware,  making  the  failures 
unexpected  and  immediate;  thus,  there  may  be  no  warning  of  the  impending  failure. 

The  concerns  raised  above  are  leading  to  the  development  of  systems  with  both  software  and 
hardware  safety  components.  Thus,  the  components  responsible  for  accident  avoidance  are 
duplicated  in  both  software  and  hardware,  the  hardware  being  used  for  gross  control  of  the 
system  and  the  software  for  finer  control.  An  example,  taken  from  a  talk  by  Jim  McWha  of  Boe¬ 
ing,  is  that  of  the  Boeing  777.  The  design  calls  for  a  digital  system  to  control  the  flight  surfaces 
(wing  flaps,  rudder,  etc.).  However,  there  is  a  traditional,  physical  system  in  case  of  a  software 
failure  that  will  permit  the  pilot  to  operate  a  number  (though  not  all)  of  the  flight  surfaces  with 
the  expectation  that  this  diminished  level  of  control  will  be  sufficient  to  land  the  aircraft  safely. 
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2.7  Development  Software  Is  Also  Safety  Critical 

Safety  analysis  of  a  system  is  performed  on  a  number  of  artifacts  created  during  the  develop¬ 
ment  of  the  system.  Later  stages  in  the  development  need  not  be  analyzed  under  the  following 
circumstances; 

1 .  The  analysis  of  the  current  stage  of  the  development  shows  that  a  system 
performing  according  to  the  current  description  is  safe. 

2.  There  is  certainty  that  any  artifacts  created  in  subsequent  development 
stages  precisely  conform  to  the  current  description. 

The  earlier  a  system  can  be  analyzed  for  safety  with  a  guarantee  that  the  second  condition  will 
be  met,  the  more  cost  effective  will  be  the  overall  development  as  less  work  will  need  to  be 
redone  if  the  current  system  description  is  shown  to  be  unsafe.  The  disadvantage  is,  of  course, 
that  the  earlier  the  analysis  is  performed,  the  greater  the  difficulty  of  achieving  the  second  con¬ 
dition.  Typically,  the  lowest  level  of  software  safety  analysis  performed  will  be  at  the  level  of 
the  implementation  language,  whether  It  be  in  an  assembly  language  or  a  high  level  language. 
In  either  case,  the  analyst  is  trusting  that  the  assembler  or  the  compiler  will  produce  an  exe¬ 
cutable  image  that,  when  executed  on  the  appropriate  target  machine,  has  the  same  meaning 
as  the  language  used  by  the  analyst.  Thus,  the  assembler  or  compiler  may  be  considered  to 
be  safety  critical.  This  is  so  because  if  the  executing  code  does  not  conform  to  the  analyzed 
system  there  is  a  possibility  that  the  system  will  be  unsafe. 

Another  part  of  the  development  environment  that  is  critical  is  the  production  system.  The  an¬ 
alyst  must  ensure  that  the  system  description  that  has  been  shown  to  be  safe  is  the  exact 
same  version  as  delivered  to  the  system  integrators.  It  is  unsafe  for  an  analyst  to  carefully  an¬ 
alyze  one  version  of  the  software  if  another  version  is  delivered.  Thus,  certain  parts  of  the  de¬ 
velopment  environment  become  critical.  It  is  important  that  trusted  development  tools  are  used 
to  develop  the  software  for  safety-critical  systems. 
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3  Hazard  Analysis  Techniques 


There  are  two  aspects  of  the  effort  to  performing  a  hazard  check  of  a  system;  hazard  Identifi¬ 
cation  and  hazard  analysis.  Although  these  will  be  presented  as  separate  topics,  giving  the 
impression  that  first  the  analyst  performs  all  hazard  identification  and  subsequently  analyzes 
the  system  to  determine  whether  or  not  the  hazards  can  occur  or  lead  to  a  mishap,  the  two 
activities  may  well  be  mixed.  The  general  approach  to  hazard  analysis  is  first  to  perform  a  pre¬ 
liminary  hazard  analysis  to  Identify  the  possible  hazards.  Subsequently,  subsystem  and  sys¬ 
tem  hazard  analyses  are  performed  to  determine  contributors  to  the  preliminary  hazard 
analysis.  These  subsequent  analyses  may  identify  new  hazards,  missed  in  the  preliminary 
hazard  analysis,  that  must  also  be  analyzed. 

3.1  Hazard  Identification 

There  does  not  appear  to  be  any  easy  way  to  identify  hazards  within  a  given  system.  After  a 
mishap  has  occurred,  a  thorough  investigation  should  reveal  the  causes  and  lead  the  system 
engineers  to  a  new  understanding  of  the  system  hazards.  However,  for  many  systems,  a  mis¬ 
hap  should  not  be  allowed  to  occur  since  the  mishap’s  consequences  may  be  too  serious  in 
terms  of  loss  of  life  or  property. 

The  only  acceptable  approach  for  hazard  Identification  is  to  attempt  to  develop  a  list  of  possi¬ 
ble  system  hazards  before  the  system  is  built. 

There  is  no  easy  systematic  way  in  which  all  of  the  hazards  for  a  system  can  be  Identified, 
though  it  should  be  noted  that  recent  work  of  Leveson  and  others  [100]  may  prove  to  be  an 
appropriate  way  of  determining  if  all  of  the  safety  conditions  for  the  particular  system  have 
been  considered,  The  best  qualified  people  to  perform  this  task  are  experts  in  the  domain  in 
which  the  system  is  to  be  deployed.  Petroski  [182]  argues  that  a  thorough  understanding  of 
the  history  of  failures  in  the  given  domain  is  a  necessary  prerequisite  to  the  development  of 
the  preliminary  hazard  list.  However,  this  understanding  of  the  history  is  not  sufficient.  The  ex¬ 
perts  need  to  understand  the  differences  between  the  new  system  and  previous  systems  so 
that  they  can  understand  the  new  failure  modes  introduced  by  the  new  system. 

The  resources  required  to  obtain  an  exhaustive  list  of  hazards  may  be  too  great  for  a  project. 
Instead,  the  project  management  must  use  some  approach  to  ensure  that  they  have  the  great¬ 
est  likelihood  of  listing  the  system  hazards.  The  obvious  approach  is  to  use  "brainstorming,'’ 
where  the  experts  list  ail  of  the  possible  hazards  that  they  envision  for  the  system.  Project 
management  also  needs  some  guidelines  to  know  when  enough  preliminary  hazard  analysis 
has  been  done.  One  such  guideline  might  be  when  the  time  between  finding  new  hazards  be¬ 
comes  greater  than  some  threshold  value,  While  this  is  no  guarantee  that  all  the  hazards  have 
been  identified,  it  may  be  an  indication  that  preliminary  hazard  analysis  is  complete  and  that 
other  hazards,  if  they  exist,  will  have  to  be  found  during  later  phases  of  development.  An  al- 
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ternative  may  be  to  use  a  consensus-building  approach  so  that  the  experts  agree  that  they 
have  collected  sufficient  potential  hazards  for  the  preliminary  hazard  list.  Approaches  such  as 
the  Delphi  Technique  or  Joint  Application  Design  (JAD)  may  be  employed. 

3.1.1  The  Delphi  Technique 

One  of  the  older  approaches  to  reaching  group  decisions  is  that  of  the  Delphi  Technique  [60]. 
This  method  was  created  by  the  Rand  corporation  for  the  U.S.  government  and  remained  clas¬ 
sified  until  the  1960s.  The  rationale  for  the  development  of  the  Delphi  Technique  was  that 
there  were  many  situations  in  which  group  consensus  was  required  where  the  members  of  the 
group  were  separated  geographically  and  it  was  not  possible  to  get  all  members  of  the  group 
together  for  a  regular  meeting.  The  method  was  originally  designed  for  forecasting  military  de¬ 
velopments,  however,  it  may  be  used  for  any  situation  where  group  consensus  is  required  and 
the  group  may  not  be  brought  together. 

The  basic  approach  is  to  send  out  a  questionnaire  to  all  members  of  the  group  that  enables 
them  to  express  their  opinions  on  the  topic  of  discussion.  After  the  responses  to  the  question¬ 
naire  have  been  received  by  the  coordinator,  the  opinions  are  reproduced  in  such  a  way  that 
the  author’s  identify  is  obscured  and  the  opinions  are  collated.  The  collated  opinions  are  sent 
out  to  the  experts  who  may  agree  or  disagree  in  writing  with  the  opinions  and  justify  any  out¬ 
lying  opinions.  The  expectation  is  that  after  a  number  of  rounds  of  anonymous  responses,  the 
group  will  converge  to  produce  some  consensus  decision.  The  group  opinion  is  defined  as  the 
aggregate  of  individual  opinions  after  the  final  round. 

The  key  idea  behind  the  Delphi  Technique  is  that  the  opinions  are  presented  anonymously  and 
that  the  only  interaction  between  the  experts  is  through  the  questionnaires.  The  Idea  is  that 
one  particularly  strong  personality  cannot  sway  the  opinion  of  the  entire  group  through  force 
of  will;  rather,  the  group  opinion  is  formed  through  force  of  reason.  The  Delphi  Technique  over¬ 
comes  the  issue  of  group  consensus  when  the  group  is  unable  to  attend  a  meeting  where  a 
method  such  as  Joint  Application  Design  might  be  employed.  However,  the  nature  of  the  Del¬ 
phi  Technique  makes  for  slow  communication  and  it  may  take  several  weeks  to  arrive  at  con¬ 
sensus.  The  use  of  electronic  mall,  a  technology  far  newer  than  the  Delphi  Technique,  may 
help  overcome  this  problem. 

3.1 .2  Joint  Application  Design 

Joint  Application  Design  (JAD)  was  first  introduced  by  IBM  as  a  new  approach  to  developing 
detailed  system  definition.  Its  purpose  is  to  help  a  group  reach  decisions  about  a  particular  top¬ 
ic.  Although  the  original  purpose  was  to  develop  system  designs,  JAD  may  be  used  for  any 
meeting  where  group  consensus  must  be  reached  concerning  a  system  to  be  deployed. 

For  JAD  to  be  successful,  the  group  must  be  made  up  of  people  with  certain  characteristics. 
Specifically,  these  people  must  be  skilled  and  empowered  to  make  decisions  for  the  group 
they  represent.  Additionally,  it  is  important  for  the  right  number  of  people  to  be  involved  in  a 
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JAD  session.  Conventional  wisdom  suggests  that  between  six  and  ten  is  optimum.  If  there  are 
too  few  people  then  insufficient  viewpoints  may  be  raised  and  important  views  may  therefore 
be  lost.  If  there  are  too  many  people,  some  may  not  participate  at  all. 

A  JAD  session  is  led  by  a  facilitator  who  should  have  no  vested  Interest  in  the  detailed  content 
of  the  design.  The  facilitator  should  be  chosen  for  reasons  of  technical  ability,  skills  in  commu¬ 
nication  and  diplomacy,  and  for  the  ability  to  maintain  control  over  a  group  of  people  that  may 
have  conflicting  views.  It  is  recommended  that  a  JAD  session  takes  place  in  a  neutral  location 
so  that  no  individual  or  group  of  people  feels  intimidated  by  the  surroundings.  A  further  advan¬ 
tage  is  that  there  should  be  fewer  interruptions  than  if  the  meeting  were  held  at  the  offices  of 
one  or  more  of  tf  ie  attendees. 

JAD  requires  an  executive  sponsor,  some  individual  or  group  of  people  who  can  ensure  the 
cooperation  of  all  persons  involved  in  the  system  design  and  development. 

It  is  important  for  the  ideas  presented  by  the  group  to  be  captured  immediately  and  to  develop 
a  group  memory.  For  JAD  to  operate  optimally,  ideas  should  become  owned  by  the  group  rath¬ 
er  than  individuals,  so  it  is  recommended  that  any  ideas  be  captured  by  the  facilitator  and  dis¬ 
played  for  all  to  see.  This  does  have  the  disadvantage  that  the  facilitator  can  become  a 
bottleneck.  There  should  be  well-defined  deliverables  so  that  the  facilitator  can  focus  the  meet¬ 
ing  and  ensure  that  the  group  makes  progress. 

3.1.3  Hazard  and  Operability  Analysis 

This  form  of  analysis,  also  known  as  operating  hazard  analysis  [145]  or  operating  and  support 
hazard  analysis,  applies  at  all  stages  of  the  development  life  cycle  and  is  used  to  ensure  a 
systematic  evaluation  of  the  functional  aspects  of  the  system. 

There  are  two  steps  in  the  analysis.  First,  the  designers  identify  their  concepts  of  how  the  sys¬ 
tem  should  be  operated.  This  includes  an  evaluation  of  operational  sequences,  including  hu¬ 
man  and  environmental  factors.  The  purpose  of  this  identification  is  to  determine  whether  the 
operators,  other  people,  or  the  environment  itself,  will  be  exposed  to  hazards  if  the  system  is 
used  as  it  is  intended.  The  second  step  is  to  determine  when  the  identified  conditions  can  be¬ 
come  safety  critical.  In  order  for  this  second  step  to  be  performed,  each  operation  is  divided 
into  a  number  of  sequential  steps,  each  of  which  is  examined  for  the  risk  of  a  mishap.  Obvi¬ 
ously,  the  point  in  the  sequence  where  an  operation  becomes  safety  critical  varies  from  sys¬ 
tem  to  system,  as  it  is  dependent  on  the  particular  part  of  the  operation,  the  operation  itself, 
and  the  likelihood  of  a  fault  occurring  in  that  step.  The  data  generated  from  the  analysis  can 
be  organized  into  tables  Indicating  the  sequence  of  operations,  the  hazards  that  might  occur 
during  those  operations,  and  the  possible  measures  that  might  be  employed  to  prevent  the 
mishap. 
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Hazard  and  operability  analysis  is  an  iterative  process  that  should  be  started  before  any  de¬ 
tailed  design.  It  should  be  continually  updated  as  system  design  progresses. 

3.1 .4  Summary 

Both  the  Delphi  Technique  and  JAD  are  approaches  to  obtaining  group  consensus  on  some 
topic.  Although  neither  of  these  techniques  was  designed  for  determining  the  preliminary  haz¬ 
ard  list,  It  is  clear  that  they  can  be  used  as  a  formal  moans  for  capturing  an  initial  list  of  potential 
system  hazards.  Participants  could  be  drawn  from  development  and  regulatory  organizations 
the  facilitator  should  be  drawn  from  some  neutral  organization. 

Hazard  and  Operability  analysis  provides  a  structured  approach  to  the  determination  of  haz¬ 
ards  and  may  be  used  as  the  basis  for  the  decision-making  process. 

3.2  Hazard  Analysis 

The  purpose  of  hazard  analysis  Is  to  examine  the  system  and  determine  which  components 
of  the  system  may  lead  to  a  mishap.  There  are  two  basic  strategies  to  such  analysis  that  have 
been  termed  inductive  and  deductive  [215].  Essentially,  inductive  techniques,  such  as  event 
tree  analysis  and  failure  modes  and  effects  analysis,  consider  a  particular  fault  in  some  com¬ 
ponent  of  the  system  and  then  attempt  to  reason  what  the  consequences  of  that  fault  will  be. 
Deductive  techniques,  such  as  fault  tree  analysis,  consider  a  system  failure  and  then  attempt 
to  reason  about  the  system  or  component  states  that  contribute  to  the  system  failure.  Thus, 
the  inductive  methods  are  applied  to  determine  what  system  states  are  possible  and  the  de¬ 
ductive  methods  are  applied  to  determine  how  a  given  state  can  occur. 

3.2.1  Fault  Tree  Analysis 

Fault  tree  analysis  is  a  deductive  hazard  analysis  technique  [215].  Fault  tree  analysis  starts 
with  a  particular  undesirable  event  and  provides  an  approach  for  analyzing  the  causes  of  this 
event.  It  Is  important  to  choose  this  event  carefully;  If  it  is  too  general,  the  fault  tree  becomes 
large  and  unmanageable;  if  the  event  is  too  specific  then  the  analysis  may  not  provide  a  suf¬ 
ficiently  broad  view  of  the  system.  Because  fault  tree  analysis  can  be  an  expensive  and  time- 
consuming  process,  the  cost  of  employing  the  process  should  be  measured  against  the  cost 
associated  with  the  undesirabie  event. 

Once  the  undesirable  event  has  been  chosen,  it  is  used  as  the  top  event  of  a  fault  tree  dia¬ 
gram.  The  system  is  then  analyzed  to  determine  all  the  likely  ways  in  which  that  undesired 
event  could  occur.  The  fault  tree  is  a  graphical  representation  of  the  various  combinations  of 
events  that  lead  to  the  undesired  event.  The  faults  may  be  caused  by  component  failures,  hu¬ 
man  failures,  or  any  other  events  that  could  lead  to  the  undesired  events  (some  random  event 
in  the  environment  may  be  a  cause).  It  should  be  noted  that  a  fault  tree  Is  not  a  model  of  the 
system  or  even  a  model  of  the  ways  in  which  the  system  could  fall.  Rather  it  is  a  depiction  of 
the  logical  interrelationships  of  basic  events  that  may  lead  to  a  particular  undesired  event. 
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The  fault  tree  uses  connectors  known  as  gates  which  either  allow  or  disallow  a  fault  to  flow  up 
the  tree.  Two  gates  are  used  most  often  in  fault  tree  analysis:  the  and  and  orgates.  For  exam¬ 
ple,  If  the  and  gate  connector  is  used,  then  all  of  the  events  leading  into  the  and  gate  must 
occur  before  the  event  leading  out  of  the  gate  occurs. 

The  and  gate  (Figure  3-1)  connects  two  or  more  events.  An  output  fault  occurs  if  all  of  the  input 
faults  occur. 


Figure  3-1:  And  Gate 


Comparable  to  the  and  gate  is  the  or  gate  (Figure  3-2)  which  connects  two  or  more  events  into 
a  tree.  An  output  fault  occurs  from  an  or  gate  if  any  of  the  input  faults  occur. 


Q 

Figure  3-2:  Or  Gate 


Other  gates  that  may  be  used  in  fault  tree  analysis  are  exclusive  or,  priority  and,  and  inhibit 
gates.  These  gates  will  not  be  used  in  this  report  and  will  not  be  explained  further;  a  full  de¬ 
scription,  however,  may  be  found  in  the  fault  tree  handbook  [21 5j. 

Gates  are  used  to  connect  events  together  to  form  fault  trees.  There  are  a  number  of  types  of 
events  that  commonly  occur  in  fault  trees. 

The  basic  eve nt  (Figure  3-3)  is  a  basic  initiating  fault  and  requires  no  further  development. 
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The  undeveloped  event  symbol  (Figure  3-4)  is  used  to  indicate  an  event  that  Is  not  developed 
any  further,  either  because  there  isn't  sufficient  Information  to  construct  the  fault  tree  loading 
to  the  event,  or  because  the  probability  of  the  occurrence  of  the  event  is  considered  to  be  in¬ 
significant. 


The  intermediate  event  symbol  (Figure  3-5)  is  used  to  indicate  a  fault  event  that  occurs  when¬ 
ever  the  gate  leaoing  to  the  event  has  an  output  fault.  Intermediate  events  are  used  to  describe 
an  event  which  Is  the  combination  of  a  number  of  preceding  basic  or  undeveloped  events. 


Figure  3-5:  Intermediate  Event 
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All  of  the  events  will  generally  contain  text  describing  the  particular  fault  that  the  event  symbol 
represents. 

The  basic  elements  of  a  fault  tree  are  gates  and  events.  These  may  be  tied  together  to  form 
fault  trees.  As  an  example,  consider  the  following  simple  fault  tree  in  Figure  3-6.  We  wish  to 
create  a  fault  tree  for  the  undesirable  event  of  a  car  hitting  a  stationary  object  while  driving  on 
a  straight  road.  As  can  be  seen  from  the  tree,  the  undesirable  event  is  represented  as  an  in¬ 
termediate  event  at  the  top  of  the  tree.  We  have  chosen  two  possibilities,  either  of  which  could 
lead  to  the  top  event;  these  are  that  the  driver  doesn't  see  the  object,  or  the  car  fails  to  brake. 
We  could  have  added  a  third  possibility,  that  the  driver  applied  the  brakes  too  late,  however, 
we  did  not  do  so  in  this  example.  We  considered  possible  causes  for  the  driver  falling  to  see 
the  object.  These  might  be  that  the  object  was  on  the  road  just  around  a  corner,  which  has 
been  represented  as  an  undeveloped  event,  or  that  the  driver  is  asleep  at  the  wheel,  a  basic 
event  of  the  system.  We  chose  to  represent  the  possibility  that  the  object  was  around  a  corner 
as  an  undeveloped  event  since  this  is  unlikely  given  that  the  road  is  a  long  straight  road  (from 
the  problem  definition);  however,  there  is  a  possibility  that  the  object  is  on  the  road  at  the  very 
start  of  that  road  and  that  the  driver  must  first  negotiate  a  corner  before  getting  onto  the  road. 
There  might  be  many  other  possibilities  why  the  driver  doesn't  see  the  object;  these  include 
fog,  the  driver  being  distracted,  the  driver  being  temporarily  blinded,  the  car  travelling  at  night 
without  lights,  etc.  When  we  considered  reasons  why  the  car  failed  to  brake,  we  listed  brake 
failure  or  ineffective  brakes  as  possibilities.  Brake  failure  was  represented  as  an  undeveloped 
event,  not  because  it  is  an  insignificant  event,  but  because  we  have  Insufficient  information  as 
to  why  brakes  fail— domain  expertise  is  required  to  further  elaborate  this  event.  We  developed 
the  ineffective  event  into  two  events,  both  of  which  must  occur  for  the  brakes  to  be  Ineffective: 
the  car  must  be  travelling  too  fast  and  the  brakes  must  be  weak. 

As  can  be  seen,  the  development  of  a  fault  tree  is  a  consideration  of  the  possible  events  that 
may  lead  to  a  particular  undesirable  event.  Domain  expertise  is  necessary  when  developing 
fault  trees  since  this  provides  the  knowledge  of  how  similar  systems  have  failed  in  the  past. 
Knowledge  of  the  system  under  analysis  is  necessary  since  the  particular  system  may  have 
introduced  additional  failure  modes  or  overcome  failures  in  previous  systems. 

Fault  tree  analysis  was  Initially  introduced  as  a  means  of  examining  failures  in  hardware  sys¬ 
tems.  Leveson  and  Harvey  extended  the  principle  of  fault  tree  analysis  to  software  systems 
[118].  Fault  trees  may  be  built  for  a  given  system  based  on  the  source  code  for  that  system. 
Essentially,  the  starting  place  for  the  analysis  is  the  point  in  the  code  that  performs  the  poten¬ 
tially  undesirable  outputs.  The  code  is  then  analyzed  in  a  backwards  manner  by  deducing  how 
the  program  could  have  got  to  that  point  with  the  set  of  values  producing  the  undesirable  out¬ 
put.  For  each  control  construct  of  the  programming  language  used,  it  Is  possible  to  create  a 
fault  tree  template  that  may  be  used  as  necessary  within  a  fault  tree.  The  use  of  templates 
simplifies  the  question  of  “How  can  the  program  reach  this  point"  and  reduces  the  possibility 
of  error  in  the  analysis. 
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Figure  3-6:  Example  Fault  Tree  for  a  Car  Crash 


Fault  tree  analysis  need  not  be  applied  solely  to  programming  language  representations  of  the 
system.  Any  formally  defined  language  used  to  represent  the  system  may  be  analyzed  using 
fault  trees  and  templates  may  be  created  for  notations  used  at  different  stages  of  the  system 
development  life  cycle.  Later  in  this  report,  we  discuss  the  application  of  software  fault  tree 
analysis  to  system  specification. 


3.2.2  Event  Tree  Analysis 

Event  tree  analysis  is  an  inductive  technique  using  essentially  the  same  representations  as 
fault  tree  analysis.  Event  trees  may  even  use  the  same  symbols  as  fault  trees.  The  difference 
lies  In  the  analysis  employed  rather  than  the  representation  of  the  trees. 

The  purpose  of  event  tree  analysis  is  to  consider  an  Initiating  event  In  the  system  and  consider 
all  the  consequences  of  the  occurrence  of  that  event,  particularly  those  that  lead  to  a  mishap. 
This  is  contrasted  with  fault  tree  analysis  which,  as  has  been  described,  examines  a  system 
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to  discover  how  an  undesirable  event  could  occur  and  eventually  leads  back  to  some  combi¬ 
nation  of  initiating  events  necessary  to  cause  the  failure  of  the  system.  Thus,  event  tree  anal¬ 
ysis  begins  by  analyzing  effects  while  fault  tree  analysis  begins  by  analyzing  potential  causes. 

The  approach  taken  is  to  consider  an  initiating  event  and  its  possible  consequences,  then  for 
each  of  these  consequential  events  in  turn,  the  potential  consequences  are  considered,  thus 
drawing  the  tree.  It  may  be  that  additional  events  are  necessary  for  an  intermediate  event  to 
occur  and  these  may  also  be  represented  in  the  tree. 

The  initiating  events  for  event  tree  analysis  may  be  both  desirable  and  undesirable  since  it  is 
possible  for  a  desirable  event  to  lead  to  an  undesirable  outcome.  This  means  that  the  choice 
of  initiating  events  is  the  range  of  events  that  may  occur  in  the  system.  This  may  lead  to  diffi¬ 
culty  in  deciding  which  events  should  be  analyzed  and  which  should  not  in  an  environment 
where  only  limited  resources  are  available  for  safety  analysis. 

Event  tree  analysis  is  forward  looking  and  considers  potential  future  problems  while  fault  tree 
analysis  is  backward  looking  and  considers  knowledge  of  past  problems. 

Event  tree  analysis  is  not  as  widely  used  as  fault  tree  analysis.  This  may  be  in  large  part  due 
to  the  difficulty  of  considering  all  of  the  possible  consequences  of  an  event  or  even  the  difficulty 
of  choosing  the  initiating  event  to  analyze.  One  reason  for  this  is  that  trees  may  become  large 
and  unmanageable  rapidly  without  discovering  a  possible  mishap.  Much  analysis  time  may  be 
wasted  by  considering  an  event  tree  from  a  given  event,  such  as  the  failure  of  a  sensor,  when 
that  event  may  never  lead  to  a  mishap.  This  may  be  contrasted  with  fault  tree  analysis  which 
is  directed  toward  the  goal  of  a  specific  failure. 

In  systems  where  there  Is  little  or  no  domain  expertise  available  (that  is,  wholly  new  systems), 
event  tree  analysis  may  play  a  valuable  role  since  the  consequences  of  individual  component 
failures  may  be  analyzed  to  determine  if  a  mishap  might  occur,  and  what  that  mishap  might 
be.  In  systems  with  past  history,  fault  tree  analysis  would  appear  to  be  a  better  analysis  tech¬ 
nique. 

3.2.3  Failure  Modes  and  Effects  Analysis 

Failure  Modes  and  Effects  Analysis  (FMEA)  [53]  is  another  inductive  technique  and  attempts 
to  anticipate  potential  failures  so  that  the  source  of  those  failures  can  be  eliminated.  FMEA 
consists  of  constructing  a  table  based  on  the  components  of  the  system  and  the  possible  fail¬ 
ure  modes  of  each  component.  FMEA  is  not  an  additional  technique  that  engineers  have  to 
learn,  but  rather  a  disciplined  way  of  describing  certain  features  (the  failure  modes)  of  the  com¬ 
ponents  and  the  effects  these  features  have  on  the  entire  system. 
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The  approach  used  is  to  create  a  tabie  with  the  following  columns:  component,  failure  mode, 
effect  of  failure,  cause  of  failure,  occurrence,  severity,  probability  of  detection,  risk  priority 
number,  and  corrective  action.  Table  3-1  is  an  example  FMEA  for  part  of  an  engine  mounting 
system.  We  have  considered  only  the  single  tie  bracket  component,  and  only  a  few  of  the  pos¬ 
sible  ways  in  which  the  bracket  may  fail. 


Table  3-1 :  Example  Failure  Modes  and  Effects  Analysis  Table 


Component 


Failure 

Mode 


Effect  of 
Failure 


Cause  of 
Failure 


Occur¬ 

rence 


Severity 


Probability  of 
Detection 


Risk 

Priority 

Number 


Corrective 

Action 


Tie  Bar 
Bracket 


Bracket 

fractures 


Stabilizing 

Inadequate 

function  of 

specification 

tie  bar 

of  hole  to 

removed, 

edge  distance 

All  engine 

motion 

transferred 

to  mount¬ 

ings 

1 


10 


70 


Test  suitabil¬ 
ity  of  specifi¬ 
cation 


Bracket 

corrodes 


As  above 


Inadequate 
specification 
for  prepara¬ 
tion  of  bracket 


10 


50 


Test  suitabil¬ 
ity  of  specifi¬ 
cation 


Fixing 

bolts 

loosen 


As  above 


Bolt  torque 

inadequately 

specified 


200 


Test  for  loos¬ 
ening 


Bolt  material 
or  thread  type 
inadequate 


10 


50 


Test  suitabil¬ 
ity  of  specifi¬ 
cation 


For  each  component,  a  list  of  the  possible  failure  modes  is  created.  These  failure  modes  are 
used  to  populate  the  second  column  of  the  table.  The  effects  of  each  failure  are  considered 
and  entered  into  the  third  column.  Although  the  existing  literature  does  not  indicate  that  it 
should  be  done,  it  would  seem  that  use  of  event  tree  analysis  may  help  in  determining  the  pos¬ 
sible  effects  of  the  component  failure.  The  potential  causes  of  the  failure  mode  are  listed  in  the 
fourth  column  of  the  table  and  similarly,  though  not  mentioned  in  the  literature,  it  would  seem 
that  fault  tree  analysis  might  be  the  appropriate  technique  for  determining  causes  of  the  com¬ 
ponent  failure. 


The  engineer  is  then  required  to  enter  a  value  indicating  the  frequency  of  occurrence  of  the 
particular  cause  of  the  failure  mode.  For  existing  hardware  components,  statistical  data  may 
exist  to  accurately  predict  failure.  However,  in  most  cases,  particularly  for  software,  the  engi¬ 
neer  will  have  to  use  knowledge  and  experience  to  make  a  best  estimate  of  the  value.  The 
values  for  the  occurrence  field  should  lie  between  1  and  10,  with  1  being  used  to  indicate  very 
low  probability  of  occurrence  and  10  a  near  certainty. 
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Based  on  the  determination  of  the  possible  effects  of  the  failure  mode,  the  engineer  must  es¬ 
timate  a  value  that  indicates  the  severity  of  the  failure.  Note  that  this  is  independent  of  the 
probability  of  occurrence  of  the  failure,  but  is  simply  used  as  an  indicator  of  how  serious  the 
failure  would  be.  Again,  a  value  between  1  and  10  is  used,  with  1  being  used  to  indicate  a  mi¬ 
nor  annoyance  and  10  a  very  serious  consequence. 

The  next  field  is  the  detection  of  failure  field.  Here  the  engineer  must  estimate  the  chance  of 
the  failure  being  detected  before  the  product  is  shipped  to  the  customer.  It  may  well  be  that  for 
software  systems,  this  field  will  be  estimated  based  on  the  quality  of  the  testing  process  and 
the  complexity  of  the  component.  Again  a  score  between  1  and  10  is  assigned,  with  1  indicat¬ 
ing  a  near  certainty  that  the  fault  will  be  detected  before  the  product  is  shipped  and  10  being 
a  near  impossibility  of  detection  prior  to  shipping. 

The  risk  priority  number  is  simply  the  product  of  the  occurrence,  severity,  and  failure  detection 
fields  and  provides  the  developers  with  a  notion  of  the  relative  priority  of  tho  particular  failure. 
The  higher  the  number  in  this  field,  the  more  serious  the  failure— leading  to  indications  of 
where  more  effort  should  be  spent  in  the  development  process. 

The  final  field  of  the  FMEA  table  is  a  description  of  potential  corrective  action  that  can  be  taken. 
It  is  unclear  whether  this  field  has  any  meaning  in  software  systems  and  further  investigation 
should  take  place  to  determine  if  any  meaningful  Information  can  be  provided  by  the  safety 
engineer.  It  may  be  that  for  software  components,  corrective  action  will  be  the  employment  of 
techniques  such  as  formal  methods  for  fault  reduction  or  fault  tolerance  techniques  for  fault 
detection  and  masking. 

A  closely  related  approach  is  the  use  of  Failure  Mode,  Effects  and  Criticality  Analysis  which 
performs  the  same  steps  as  FMEA,  but  then  adds  a  criticality  analysis  to  rank  the  results.  The 
FMEA  described  does  provide  a  way  of  ranking  results,  however  the  FMECA  provides  a  more 
formal  process  for  performing  the  criticality  analysis. 

3.3  Summary 

The  process  of  performing  a  safety  analysis  of  a  system  Is  time  consuming  and  employs  many 
techniques  all  of  which  require  considerable  domain  expertise.  It  is  clear  that  for  the  safest 
possible  systems,  the  best  available  staff  should  be  used  for  the  safety  analysis. 

There  would  appear  to  be  two  approaches  that  can  be  taken: 

1,  Create  a  list  of  all  hazards  and  for  those  with  a  sufficiently  high  risk  perform 
fault  tree  analysis  indicating  which  components  are  safety  critical.  Then  for 
those  components,  continue  to  apply  hazard  analysis  techniques  at  each 
stage  of  development. 

2.  Perform  an  FMEA  for  all  components  of  the  system,  potentially  using  fault 
tree  and  event  tree  analysis  to  determine  causes  and  effects  of  a  component 
failure  respectively.  Employ  the  best  development  techniques  (usually  more 
expensive)  on  those  components  with  an  unacceptably  high  criticality  factor. 
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4  Development  Techniques  for  Safety-Critical  Software 


We  have  discussed  techniques  that  help  the  system  developers  determine  the  safety  critical 
components  of  the  system.  It  is  important  that  the  development  process  be  one  that  does  not 
introduce  new  failures  that  may  lead  to  a  system  mishap.  This  chapter  outlines  the  techniques 
most  commonly  discussed  in  the  literature  that  avoid  the  introduction  of  errors  into  the  system 
during  the  development  process.  Note,  though,  that  at  each  level  of  representation  it  is  possi¬ 
ble  to  employ  hazard  analysis  techniques  such  as  fault  tree  analysis.  Such  an  analysis  is  not 
excluded  by  the  techniques  described  in  this  chapter,  nor  does  it  replace  the  need  for  the  tech¬ 
niques  described.  The  development  process  is  complemented  by  the  analysis  of  each  devel¬ 
opment  and  for  the  safest  systems,  both  approaches  should  be  used. 

There  is  a  widely  held  belief  that  formal  methods  should  form  part  of  the  development  process 
for  safety-critical  systems.  Experiments  have  been  reported  in  a  number  of  conferences  (such 
as  Daniels  [47])  which  support  this  belief.  Thus,  each  section  that  follows  will  contain  some 
discussion  concerning  the  relevance  of  formal  methods  to  the  particular  development  topic. 

It  should  be  noted,  though,  that  researchers  and  developers  do  not  see  formal  methods  as  the 
only  technique  that  should  be  employed;  rather,  formal  methods  should  be  used  in  conjunction 
with  existing  approaches  to  system  development.  It  is  the  combination  of  techniques  that  will 
lead  to  safer  systems. 

4.1  Requirements 

The  requirements  for  a  system  are  generally  presented  in  terms  of  natural  language  descrip¬ 
tion  of  the  function  of  the  system,  desired  performance  characteristics,  predetermined  design 
decisions  such  as  the  use  of  particular  hardware  or  software  packages,  and  many  other  non¬ 
functional  characteristics  such  as  the  maintainability  of  the  system. 

Because  the  natural  language  representation  is  often  ambiguous  and  incomplete,  only  limited 
analyses  may  be  performed.  Typically,  the  first  step  is  to  represent  the  requirements  in  a  no¬ 
tation  that  is  not  ambiguous  and  may  be  analyzed.  This  is  the  process  of  specification;  the  re¬ 
sult  is  a  specification  of  the  behavior  of  the  system.  Given  that  the  specification  is  in  a  notation 
that  may  be  unfamiliar  to  the  system  procurers,  the  specification  should  be  validated  with  the 
procurers  to  ensure  that  the  specification  is  a  true  representation  of  the  intended  system. 

4.1 .1  Specification  and  Analysis 

There  are  many  notations  that  may  be  used  to  specify  systems,  and  these  notations  have 
varying  levels  of  formality.  Standards  such  as  the  U.K.  Ministry  of  Defence  Standard  MOD  00- 
55  (see  Section  5.1 .1)  require  that  the  specification  be  written  in  a  formal  notation.  Other  stan¬ 
dards  offer  other  alternatives.  We  will  concentrate  on  formal  notations  since  these  offer  the 
greatest  opportunity  for  analysis. 
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There  is  no  simple  way  to  formalize  the  requirements  to  produce  the  specification.  The  spec¬ 
ifiers  must  read  the  requirements  and  translate  them  into  statements  written  in  the  formal  no¬ 
tation.  It  is  important  for  the  specifiers  to  be  expert  in  the  domain  in  which  the  system  is  to  be 
deployed.  The  specification  may  be  analyzed  for  inconsistencies  and,  to  a  limited  extent,  in¬ 
completeness  (such  an  analysis  depends  on  the  expertise  of  the  analysts  rather  than  the  spec¬ 
ification  technique  used), 

There  are  a  number  of  different  types  of  formal  specification  technique  that  may  be  used. 
Some  notations  are  better  suited  to  the  specification  of  concurrency  while  others  are  better 
suited  to  sequential  systems.  The  choice  of  notation  should  depend  on  the  system  being  spec¬ 
ified  and  the  expected  expertise  of  the  developers  who  will  have  to  read  and  understand  the 
specification.  Leveson  and  others  have  described  the  use  of  Petri-nets  [1 1 9]  and  also  the  use 
of  state  machines  [98]  as  appropriata  specification  notations  that  may  be  used  to  model  the 
system  and  analyze  the  system  for  unsafe  behaviors.  MOD  00-55  lists  eight  notations  that  may 
be  used  for  specification. 

Melhart  describes  the  use  of  Statecharts  for  the  specification  of  the  system  properties  and  then 
indicates  how  a  fault  tree  analysis  may  be  performed  on  the  Statechart  representations  of  the 
system  [1 59]. 

What  is  common  to  each  of  the  approaches  is  that  the  requirements  are  formalized  using  a 
mathematically-defined  notation.  Subsequently  the  formal  representation  of  the  requirements 
is  analyzed  and  undesirable  properties  are  removed. 

4.1.2  Validation 

One  of  the  hard  issues  to  handle  in  the  formalization  of  the  requirements  is  ensuring  that  the 
specification  describes  the  desired  system.  It  is  certain  that  the  specification  describes  a  mod¬ 
el  of  some  system,  but  the  question  remains  as  to  whether  the  model  accurately  represents 
the  desired  system.  There  is  no  way  to  formally  prove  that  the  specification  is  a  model  of  the 
desired  system.  (It  should  be  noted  that  this  is  true  whether  or  not  the  specification  is  written 
in  a  formal  notation.) 

The  specification  needs  to  be  validated  with  the  people  who  developed  the  requirements.  The 
validation  process  may  take  a  number  of  forms.  Some  specifications  are  executable  and  the 
system  specification  may  be  used  to  validate  behavior.  This  is  done  by  running  the  specifica¬ 
tion  on  appropriate  sets  of  inputs  and  determining  if  the  behavior  of  the  specification  is  consis¬ 
tent  with  the  desired  behavior  for  the  system.  Some  specifications  are  not  executable. 
However,  for  such  specifications,  it  is  possible  to  use  the  specification  as  a  model  to  predict 
how  the  system  will  behave  given  that  the  initial  state  and  inputs  to  the  system  are  known.  This 
can  be  checked  against  the  notion  of  the  correct  state  of  the  system  for  validity.  Usually,  the 
prediction  is  done  by  a  mathematical  proof  and  by  creating  a  formal  description  of  the  resulting 
state.  This  is  best  validated  by  interpreting  the  mathematics  back  into  a  natural  language  de¬ 
scription  of  the  state, 
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The  specification  may  be  compared  with  a  mathematical  model  of  any  existing  standards  that 
apply  to  the  domain  in  which  the  system  is  to  be  deployed  and  it  may  be  possible  to  demon¬ 
strate  that  a  system  conforming  to  the  model  described  by  the  specification  meets  (or  fails  to 
meet)  the  definitions  of  safety  described  In  any  appropriate  standards. 

4.2  Design 

Assuming  that  a  formal  specification  has  been  created  and  validated  with  respect  to  system 
safety,  design  of  the  system  must  be  performed  with  two  considerations  in  mind.  The  first  con¬ 
sideration  should  be  that  the  design  does  not  create  new  system  hazards  by  adding  unintend¬ 
ed  function  to  the  system  or  eliminating  function  described  in  the  specification;  the  second,  that 
the  representation  of  the  system  comes  closer  to  an  implementation,  filling  in  detail  as  neces¬ 
sary.  This  latter  consideration  is  the  usual  design  consideration  and  we  will  not  discuss  it  fur¬ 
ther  in  this  document.  Note  also  that,  as  for  specifications,  safety  analysis  may  be  performed 
during  the  design  process  [38]. 

Looking  at  the  correctness  consideration  for  the  design  process,  it  Is  clear  that  the  specifica¬ 
tion  must  be  written  formally.  Otherwise,  the  design,  even  if  presented  in  a  formal  notation, 
cannot  be  checked  against  the  specification.  (This  is  the  same  problem  as  checking  a  formal 
specification  against  informally  represented  requirements.) 

The  two  approaches  to  design  are: 

1 .  To  perform  the  design  using  some  appropriate  design  principles  and  notation 
and  then  represent  the  result  of  the  design  process  in  a  formal  notation  and 
prove  that  the  design  satisfies  the  specification. 

2.  To  successively  transform  the  specification  using  design  principles  as  a  guide 

to  the  selection  of  the  transformations  and  demonstrate  the  correctness  of  . 
each  transformation. 

The  advantages  to  the  first  approach  are  that  the  more  familiar  design  notations  may  be  em¬ 
ployed.  However,  there  are  a  number  of  disadvantages. 

1 .  The  transformation  from  the  design  notation  into  the  formal  notation  may  lead 
to  errors  that  will  only  be  detected  when  the  formal  representation  of  the  de¬ 
sign  is  compared  to  the  specification.  A  corollary  is  that  all  of  the  design  work 
will  have  been  performed  before  a  formal  check  that  the  design  satisfies  the 
specification  can  be  performed.  If  an  error  is  introduced  in  the  design  process 
much  work  may  have  to  be  redone. 

2.  The  proof  that  the  design  satisfies  the  specification  will  be  hard  and 
complicated  since  the  two  representations  of  the  system  will  be  dissimilar. 

3.  If  corrections  or  enhancements  are  to  be  made,  it  may  be  less  clear  how  to 
correct  all  of  the  design  documentation,  since  it  may  not  be  possible  to  infer 
from  which  parts  of  the  design  particular  pieces  of  the  implementation  have 
been  derived.  The  use  of  different  notations  complicates  tracking  of 
specification  through  design. 
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The  approach  of  successive  transformation  overcomes  the  disadvantages  outlined  above. 
However,  it  also  has  disadvantages. 

1 .  This  is  still  an  immature  technology  and  although  there  have  been  a  number 
of  publications  on  the  topic  [165],  it  is,  at  the  time  this  report  is  being  written, 
unproven  in  large  scale  applications.  We  can  assume  that,  as  time  passes 
and  the  transformational  approach  is  employed  more  frequently,  this  disad¬ 
vantage  will  evaporate. 

2.  There  will  be  many  more  representations  of  the  system,  each  of  which  will  be 
similar  to  the  preceding  and  successive  representations.  This  places  a  much 
greater  burden  on  the  development  environment,  particularly  with  respect  to 
tracking  the  connections  between  pieces.  Some  experiments  have  indicated 
that  a  hyper-text-based  environment  may  overcome  this  problem. 

4.3  Implementation 

It  may  be  argued  that  the  most  important  concern  of  implementation  is  conformance  between 
the  executing  code  and  the  design,  that  is,  that  the  executing  code  has  the  same  semantics 
as  the  lowest  level  of  design.  There  are  a  number  of  aspects  of  implementation  that  affect  con¬ 
formance:  the  development  tools  used,  formal  verification  of  implementation,  and  runtime 
checking. 

4.3.1  Development  Tools 

We  have  already  touched  on  the  idea  that  the  development  environment  is,  to  some  extent, 
safety  critical.  It  is  important  that  the  software  that  has  been  checked  and  found  to  be  safe  is 
the  software  that  is  built  and  delivered.  One  aspect  of  the  development  environment  of  partic¬ 
ular  importance  is  tho  version  management  system.  Typically,  many  analyses  are  performed 
at  the  implementation  level  on  the  code  that  is  compiled  into  the  system.  Analyses  such  as 
code  reviews  and  software  fault  tree  analysis  make  the  assumption  that  the  code  that  is  used 
to  build  the  software  is  the  code  that  has  been  analyzed.  There  are  two  consequences  of  this 
assumption, 

1 .  That  a  version  management  system  exists  and  is  used  so  that  the  system  in¬ 
tegrators  may  have  absolute  confidence  that  the  code  that  was  analyzed  is 
the  code  used  to  perform  the  system  build.  A  similar  potentially  hazardous  sit¬ 
uation  can  occur  if  the  code  is  built  and  tested  and  then  small  changes  are 
made  and  the  resulting  system  is  not  put  through  the  same  testing  process. 
Examples  of  such  failures  occurred  in  1991  when  both  AT&T  and  a  number 
of  the  Bell  telephone  exchanges  failed  and  phone  connections  could  not  be 
made.  In  both  cases,  seemingly  minor  changes  were  Introduced  into  the  sys¬ 
tem  after  the  testing  process  had  been  completed  that  introduced  wholly  new 
failure  modes  for  the  systems. 

2.  That  the  compiler,  assembler  and  linker  produce  an  executable  image  that 
has  the  same  meaning  as  assumed  at  the  level  of  the  code,  The  implication 
here  is  that  the  development  tools  should  be  formally  verified  and  that  the 
code  should  be  run  on  verified  hardware.  This  is  the  idea  behind  the 
Computational  Logic  Inc.  stack  and  the  ProCoS  project. 
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4.3.2  Formal  Verification 

One  very  strong  approach  to  demonstrating  conformance  between  the  implementation  and 
the  design  is  to  formally  verify  that  the  implementation  has  exactly  the  same  meaning  as  de¬ 
scribed  in  the  design. 

Formal  verification  requires  that  the  design  be  expressed  in  a  formal  notation  (that  is,  a  nota¬ 
tion  with  mathematically  defined  semantics)  and  that  the  semantics  of  the  Implementation  lan¬ 
guage  are  also  formally  defined.  Note  that  only  the  semantics  of  the  language  constructs  that 
are  used  in  the  implementation  need  to  be  formally  defined.  Thus,  it  is  possible  to  use  a  subset 
of  a  language  for  which  full  formal  semantics  do  not  exist  as  long  as  the  semantics  of  the  con¬ 
structs  in  that  subset  are  defined. 

The  process  of  formal  verification,  then,  is  one  of  proving  a  correspondence  between  the  state¬ 
ments  of  the  program  and  the  statements  of  the  design.  In  many  cases,  the  design  is  ex¬ 
pressed  as  a  number  of  program  pieces  (module,  procedure,  function,  sequence  of 
statements)  with  conditions  on  the  piece  describing  the  input  and  output  states  for  that  piece. 
These  descriptions  are  generally  stated  in  terms  of  a  predicate  describing  all  of  the  input  states 
for  which  the  program  piece  is  expected  to  operate  and  a  predicate  on  the  output  states  show¬ 
ing  the  relationship  between  the  input  states  and  the  output  states.  Then,  it  is  possible  to  ex¬ 
amine  the  code  and  construct  a  proof  that  the  statements  in  the  appropriate  program  piece  do 
implement  the  relationship  between  the  input  and  output  states  for  all  the  inputs  described  by 
the  input  predicate. 

It  should  be  stated  that  the  proof  of  correctness  is  not  a  trivial  task  and  that  mistakes  may  be 
made.  Thus,  the  use  of  tools  to  assist  in  the  proof  is  an  important  part  of  the  process.  Again, 
this  approach  depends  on  the  compilation  tools  as  It  makes  the  assumption  that  the  execut¬ 
able  image  has  exactly  the  same  meaning  as  the  meaning  implied  by  the  statements  of  the 
program  fragments  using  the  formal  semantics  of  those  statements. 

4.3.3  Runtime  Checking 

Another  approach  to  ensuring  conformance  between  the  implementation  and  the  design  is  to 
check  the  operation  of  the  system  dynamically.  There  are  two  classes  of  approach  that  may 
be  used  to  check  the  runtime  behavior  of  the  system:  the  development  of  self-checking  code 
or  the  development  of  an  independent  monitor.  In  both  cases,  the  checking  part  of  the  system 
will,  if  the  system  deviates  from  the  expected  behavior,  take  steps  to  avoid  a  hazard. 

There  are  two  approaches  that  might  be  used  to  create  self-checking  code:  have  the  develop¬ 
ers  insert  additional  checks  into  the  code  or  have  the  compiler  insert  the  checks. 

1.  It  is  certainly  possible  for  the  developers  to  insert  additional  checks  into  the 
code,  which  essentially  monitor  the  state  of  the  system,  and  If  an  erroneous 
state  is  detected  to  take  actions  to  correct  the  state.  The  effectiveness  of  this 
approach,  however,  is  questionable:  as  an  experiment  by  Leveson  and  others 
[135]  indicates  that  in  many  cases,  the  self-checking  code  failed  to  detect 
known  errors. 
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2.  The  compiler  may  be  used  to  insert  checks  into  the  code  based  on  the  design 
specification.  An  example  of  this  approach  is  the  ANNA  language  (Annotated 
Ada)  [147].  Assuming  that  the  design  is  represented  in  ANNA  and  the  code  is 
in  Ada,  then  the  compiler  will  Insert  runtime  checks  into  the  executable  code 
based  on  the  ANNA  design  specification.  These  checks  will  raise  an 
exception  (which  would  need  to  be  trapped  and  appropriate  hazard-avoiding 
action  taken)  when  the  state  of  the  system  is  different  from  the  ANNA 
description.  The  advantage  of  this  approach  Is  that  the  checks  are  inserted 
automatically  at  every  point  in  the  code  where  there  is  an  applicable  ANNA 
description  (for  example,  procedure  entry  and  exit  or  even  at  the  level  of  a 
variable  changing  value).  The  disadvantage  of  this  approach  is  that  it  may 
add  considerable  processing  time  to  the  execution  of  the  system. 

The  other  approach  described  in  the  literature  Is  that  of  a  monitor  that  acts  independently  of 
the  software  and  checks  the  outputs  from  the  software.  An  example  of  the  monitor  approach 
is  described  by  Leveson  in  papers  on  Murphy  [124].  If  those  outputs  are  at  variance  with  the 
monitor,  then  the  monitor  may  either  substitute  its  own  outputs  or  invoke  some  other  piece  of 
software  that  will  return  the  system  to  a  safe  state,  The  monitor  approach  may  be  used  to  ex¬ 
amine  just  the  outputs  of  the  software  or  may  be  extended  to  the  state  of  the  entire  system. 

The  operation  of  the  monitor  is  very  Important  in  this  type  of  approach  and  the  developers  must 
be  able  to  guarantee  that  the  monitor  will  always  act  correctly.  That  is,  the  computations  within 
the  monitor  must  be  correct  with  respect  to  the  expectations  of  the  monitor  function  and  also 
that  any  hazard-avoiding  action  the  monitor  takes  must  operate  correctly.  If  the  monitor  or  the 
avoidance  actions  fail  to  function  correctly,  the  system  may  still  be  hazardous  due  to  failures 
in  the  software,  indeed,  the  system  may  be  mure  hazardous  since  the  monitor  could  take  over 
control  when  the  monitor  Is  in  an  erroneous  state  and  lead  to  the  occurrence  of  a  mishap. 
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standards 


This  chapter  outlines  the  standards  that  pertain  to  the  development  of  software  for  safety-crit¬ 
ical  systems.  The  chapter  concludes  with  a  section  describing  the  possible  negative  effects 
that  standards  have  on  the  development  community. 

5.1  MOD  00-55  &  MOD  00-56 

These  two  standards  were  produced  by  the  U.K.  Ministry  of  Defence  (MOD)  in  1991,  though 
draft  versions  were  available  earlier.  These  standards  are  labelled  as  interim  defense  stan¬ 
dards.  This  means  that  they  are  not  yet  in  full  force.  The  standards  may  evolve  further  before 
they  are  fully  enforced.  Essentially,  they  may  be  treated  as  a  statement  of  Intent,  that  is,  that 
the  MOD  expects  that  at  some  time  over  the  next  five  years,  the  standards  as  written,  or  a  vari¬ 
ation  on  these  standards,  will  be  enforced. 

Although  each  of  these  documents  is  a  standard  in  Its  own  right,  they  are  generally  considered 
as  a  cooperating  pair  of  standards.  MOD  00-55  concerns  the  procurement  of  safety-critical 
software  and  MOD  00-56  concerns  the  hazard  analysis  and  safety  classification  of  computer 
hardware  and  software. 

5.1.1  MOD  00-55 

This  standard  [163]  covers  the  procurement  of  safety-critical  software  for  Ministry  of  Defence 
(UK)  systems.  Software  is  determined  to  be  safety  critical  using  safety  Integrity  requirements 
that  are  determined  according  to  MOD  00-56.  The  standard  is  in  two  parts,  requirements  and 
guidelines. 

The  standard  describes  a  software  development  process  where  verification  and  validation  are 
integral  parts.  Formal  methods,  dynamic  testing  and  static  path  analysis  are  techniques  re¬ 
quired  by  the  standard  to  achieve  high  levels  of  safety  integrity.  It  is  stressed  that  the  standard 
only  applies  to  safety-critical  software.  The  requirements  of  the  standard  have  three  major  sec¬ 
tions:  general,  safety  management,  and  software  engineering  practices  requirements, 

First,  the  general  section  introduces  the  standard,  and  the  scope  of  the  standard.  It  should  be 
noted  that  the  standard  disclaims  responsibility  for  liability  even  if  the  standard  is  followed 
completely.  The  general  section  also  introduces  definitions  used  throughout  the  document. 

The  second  section  details  safety  management.  The  standard  requires  various  management 
practices  to  be  employed  in  the  development  of  safety-critical  systems.  One  example  is  that  a 
specific  individual  is  required  to  be  responsible  for  all  safety  issues  throughout  the  life  of  the 
system.  Another  example  is  a  requirement  to  identify  safety  critical  components  as  soon  as 
possible  and,  at  each  stage  of  design,  a  hazard  analysis  and  safety  risk  assessment  must  be 
carried  out  (in  accordance  with  MOD  00-56)  to  identify  all  potential  failures  that  may  cause  new 
hazards.  Associated  with  this  requirement  is  a  further  requirement  to  establish  the  correct 
safety  integrity  level  for  all  system  components. 
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The  standard  requires  that  a  risk  analysis  be  performed  that  demonstrates  that  the  techniques 
and  tools  described  in  the  safety  plan  are  appropriate  to  the  type  of  system  being  developed 
and  that  development  can  be  undertaken  with  acceptable  risk  to  the  project  success.  The  risk 
analysis  must  be  performed  early  In  the  project  life  cycle  and  be  re-performed  If  any  of  the  as¬ 
sumptions  on  which  it  is  based  change. 

The  standard  requires  that  the  verification  and  validation  (V&V)  team  be  independent  of  the 
design  team.  The  V&V  team  must  prepare  a  verification  and  validation  plan  to  verify  the  safety- 
critical  software  by  using  dynamic  testing  and  by  checking  the  correctness  of  formal  argu¬ 
ments  presented  by  the  design  team. 

There  is  a  requirement  for  an  independent  safety  auditor  to  be  appointed.  The  position  of  the 
independent  safety  auditor  is  created  under  a  separate  contract  and,  if  at  all  possible,  the 
same  individual  is  expected  to  act  as  Independent  safety  auditor  throughout  the  system  devel¬ 
opment.  The  independent  safety  auditor  must  be  commercially  and  managerially  separate 
from  the  design  team  and  will  produce  an  audit  plan  at  the  start  of  the  project  that  will  be  up¬ 
dated  at  the  start  of  each  subsequent  project  phase,  The  independent  safety  auditor  oversees 
all  work  that  influences  safety  Integrity  and  periodically  audits  the  project  to  ensure  conform¬ 
ance  to  the  safety  plan. 

A  safety  plan  is  to  be  developed  In  the  earliest  phases  of  the  project,  no  later  than  the  project 
definition  phase,  and  is  to  be  updated  at  each  subsequent  phase  of  the  project.  The  safety 
plan: 

•  Shows  the  detailed  safety  planning  and  control  measures  that  will  be  used. 

•  Contains  descriptions  of  the  management  and  technical  procedures  used  for 
the  development  of  the  safety-critical  software. 

•  Describes  the  resources  and  organizations  required  by  standard. 

•  Identifies  the  key  staff  by  name. 

The  standard  requires  that  a  safety  records  log  be  maintained  which  contains  evidence  that 
the  required  safety  integrity  level  is  achieved,  The  safety  records  log  includes  the  results  of 
hazard  analyses,  modelling  reports,  and  the  results  of  checking  the  formal  arguments. 

The  standard  describes  requirements  on  documentation,  deliverable  items,  configuration 
management:  the  requirements  on  certification  and  acceptance  Into  service:  and  requires  that 
the  design  team  submit  a  safety-critical  software  certificate,  signed  by  the  design  team  and 
counter-signed  by  the  independent  safety  auditor  providing  a  clear,  unambiguous  and  binding 
statement  that  the  software  is  fit  for  use  and  conforms  to  the  requirements  of  the  standard. 

The  third  section  of  the  requirements  describes  the  software  engineering  practices  that  are  to 
be  used  by  the  developers  of  the  safety-critical  software.  This  section  discusses  issues  relat¬ 
ing  to  specification,  design,  coding  and  reuse. 
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The  standard  states  that  the  first  step  in  the  design  of  safety-critical  software  is  the  production 
of  a  software  specification  from  the  software  requirements  specification.  The  software  specifi¬ 
cation  must  include  a  specification  using  the  notation  of  an  approved  formal  method  and  an 
English  commentary  (including  any  appropriate  engineering  notations)  on  the  specification. 
The  design  team  is  required  to  check  the  formal  specification  for  syntactic  and  type  errors  us¬ 
ing  an  approved  tool.  The  team  Is  also  required  to  construct  proofs  showing  that  the  specifica¬ 
tion  is  Internally  consistent.  Further,  the  design  team  is  required  to  validate  the  software 
specification  by  means  of  animating  the  formal  specification.  The  animation  is  performed  by 
the  construction  of  various  formal  arguments  and  by  the  execution  of  an  executable  prototype 
derived  from  the  software  specification. 

The  standard  requires  that  each  step  of  the  design  be  described  using  a  design  description. 
The  design  description  comprises  a  formal  description  of  the  design  using  an  approved  formal 
method  and  an  English  commentary  on  the  design.  The  standard  requires  that  safety-critical 
software  is  designed  so  that  it  is  easy  to  justify  that  the  design  meets  the  specification.  This 
may  mean  using  short,  uncomplicated  software  and  may  inhibit  the  use  of  concurrency,  inter¬ 
rupts,  floating  point,  recursion,  or  a  number  of  other  aspects  of  programming.  The  design  team 
is  required  to  construct  formal  arguments  demonstrating  that  the  formal  design  satisfies  the 
formal  specification. 

The  standard  requires  that  coding  standards  that  lead  to  clear,  analyzable  code  be  used.  The 
code  must  be  analyzed  using  formal  arguments  and  static  path  analysis.  The  implementation 
language  used  must  have  various  characteristics  including  block  structure,  strong  typing,  a 
formally  defined  syntax  and  a  well  understood  semantics.  The  design  team  must  use  a  static 
path  analysis  tool  to  check  control  flow  (including  redundant  code),  data  use  and  Information 
flow.  The  team  must  also  create  formal  arguments  that  prove  that  the  code  satisfies  the  formal 
design. 

Much  of  the  development  requires  the  use  of  formal  arguments.  These  may  either  be  formal 
proofs  or  rigorous  arguments,  the  former  being  a  complete  mathematical  proof  and  the  latter 
being  the  outline  of  a  proof.  (It  Is  expected  that  formal  proofs  will  be  required  most  often  but 
there  are  cases  where  a  rigorous  argument  would  be  sufficient.) 

There  are  requirements  for  the  performance  of  dynamic  testing  and  for  reusing  existing  soft¬ 
ware.  The  latter  requirements  state  that  there  must  be  agreement  from  the  safety  assurance 
authority,  the  program  manager,  and  the  Independent  safety  auditor  before  the  software  may 
be  reused.  Further,  if  necessary,  the  design  team  may  have  to  formally  specify  and  verify  the 
software  being  reused. 

The  standard  requires  that  all  tools  used  In  the  development  of  safety-critical  software  have  a 
sufficient  safety  integrity  level  to  ensure  that  they  do  not  jeopardize  the  safety  integrity  of  the 
safety-critical  software.  The  development  tools  are  assigned  their  integrity  levels  according  to 
MOD  00-56. 
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The  second  part  of  MOD  00-55  is  guidance  on  the  requirements  stated  in  the  first  part.  It  elab¬ 
orates  on  the  requirements  to  make  the  achievement  and  assessment  of  conformance  to  the 
requirements  easier.  Further,  it  provides  technical  background  to  the  requirements. 


The  system  should  be  designed  so  that  the  safety-critical  software  is  isolated  as  fast  as  pos¬ 
sible.  This  isolation  minimizes  the  amount  of  software  that  has  to  be  developed  to  the  high  lev¬ 
els  of  safety  integrity.  This  is  particularly  important  for  software  that  does  not  implement  a 
safety  function,  but  whose  failure,  if  it  is  tightly  coupled  with  safety-critical  software,  may  cause 
the  safety-critical  software  to  fail. 

The  standard  suggests  that  the  Independent  safety  auditor  should  be  a  chartered  engineer 
and  have  a  minimum  of  five  years’  experience  with  safety-critical  software  and  its  implemen¬ 
tation.  The  auditor  should  also  be  experienced  In  the  methods  that  the  design  team  proposes 
to  use. 

The  standard  offers  guidance  on  the  safety  reviews  which  includes  formal  reviews  such  as  Fa¬ 
gan  inspections  and  various  checks  on  the  English  commentaries,  including  spelling  checks 
and  checks  for  unexpected  words  in  the  document  (i.e.,  words  that  pass  the  spelling  check  but 
are  erroneous). 

Criteria  for  the  selection  of  the  formal  methods  Include:  that  the  notation  should  be  formally 
defined,  that  the  method  should  provide  guidance  on  strategies  for  building  a  verifiable  design, 
that  there  are  case  studies  in  the  literature  demonstrating  successful  industrial  use,  and  that 
courses,  textbooks,  and  tools  should  be  available. 

Guidance  Is  also  offered  on  the  use  of  the  formal  method.  This  includes  guidance  on  checks 
to  be  performed  on  the  formal  specification  as  well  as  the  generation  and  discharge  of  proof 
obligations  through  formal  reasoning. 

As  stated  in  the  discussion  of  the  first  part,  the  standard  lecommonds  that  various  program¬ 
ming  techniques  be  avoided.  The  second  part  offers  reasons  why  the  particular  programming 
techniques  should  be  avoided.  The  general  reason  is  that  use  of  certain  programming  tech¬ 
niques  makes  some  formal  arguments  harder  to  prove. 

Guidance  is  offered  on  the  type  of  programming  language  to  be  used.  It  Is  strongly  suggested 
that  a  conventional  procedural  language  be  used  rather  than  assembly  language  or  other  un¬ 
conventional  languages.  There  is  guidance  on  how  to  perform  static  path  analysis  and  how  to 
review  the  results  of  that  analysis. 

Tools  are  discussed,  as  well  as  a  scheme  for  tool  integrity  level.  This  is  a  four-level  scheme, 
comparable  to  that  of  the  safety  integrity  level  scheme.  Tools  are  also  classed  as  transforma¬ 
tional  (such  as  a  compiler),  V&V  (such  as  static  path  analyzers),  clerical  (such  as  editors),  or 
infrastructure  (such  as  the  operating  system  or  configuration  management). 
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5.1.2  MOD  00-56 

The  purpose  of  this  standard  is  to  identify,  evaluate  and  record  the  hazards  of  a  system  to  de¬ 
termine  the  maximum  tolerable  risk  from  it,  and  to  facilitate  the  achievement  of  a  risk  that  is  as 
low  as  reasonably  practicable  and  below  the  maximum  tolerable  level  [1 64],  This  activity  will 
determine  the  safety  criteria  and  a  reasonable  and  acceptable  balance  between  the  reduction 
of  risk  to  safety  and  the  cost  of  that  risk  reduction. 

The  standard  uses  a  hazard  log,  which  is  described  as  the  principle  means  for  establishing 
progress  on  the  resolution  of  intolerable  risks.  Using  the  hazard  log,  the  standard  helps  the 
contractors  determine  whether  the  system  can  cause  accidents,  and,  if  the  system  is  hazard¬ 
ous,  If  the  risk  from  the  system  is  tolerable.  The  hazard  log  also  provides  a  mechanism  to  iden¬ 
tify  and  track  critical  components.  The  hazard  log  is  Initiated  during  the  initiation  phase  of  the 
life-cycle  and  is  updated  both  as  hazards  are  discovered  or  as  previously  identified  hazards 
have  been  eliminated  or  their  associated  risk  reduced  to  a  tolerable  level.  The  hazard  log  is 
reviewed  on  a  regular  basis  through  the  project  life  cycle. 

The  standard  uses  the  notion  of  four  classes  of  risk.  These  are  determined  by  classifying  each 
hazard  in  one  of  the  four  categories  of  accident  severity  (catastrophic,  critical,  marginal,  and 
negligible),  assigning  one  of  six  probability  levels  to  the  hazard  and  using  a  table  to  determine 
the  risk  class  (Intolerable,  undesirable,  tolerable — if  the  project  safety  review  committee 
agrees,  and  tolerable)  using  normal  project  reviews. 

The  standard  requires  that  for  every  identified  hazard,  a  preliminary  hazard  analysis,  system 
change  hazard  analysis,  and  a  safety  review  must  be  performed.  For  hazards  that  are  intoler¬ 
able  or  undesirable  an  independent  safety  audit  is  required. 

There  are  five  approaches  in  decreasing  order  of  preference  to  be  used  to  reduce  the  risk  as¬ 
sociated  with  a  hazard:  re-specification,  redesign,  incorporation  of  safety  features,  incorpora¬ 
tion  of  warning  devices,  and  operating  and  training  procedures. 

The  standard  requires  that  the  hazard  classification  depends  not  only  on  the  system  being  de¬ 
veloped,  but  also  on  the  systems  with  which  it  will  Interact  and  the  environment  in  which  it  will 
operate.  If  the  system  is  used  in  a  new  environment  or  will  interact  with  different  systems,  the 
analyses  must  be  re-performed.  The  standard  recognizes  that  many  parts  of  the  original  anal¬ 
yses  may  still  be  useful  in  the  new  environment,  however,  they  cannot  be  used  as  a  substitute 
for  analysis  in  the  new  environment. 

The  standara  describes  various  forms  of  analysis  to  be  performed.  Preliminary  hazard  identi¬ 
fication  which  may  be  initiated  using  data  from  similar  systems  and  enhanced  using  a  hazard 
and  operability  analysis  or  an  alternative  technique  that  satisfies  certain  criteria.  Preliminary 
hazard  analysis  sets  the  boundaries  for  the  system;  it  identifies  the  system  hazards  based  on 
the  preliminary  hazard  identification  and  the  requirements.  The  results  are  recorded  in  the  haz¬ 
ard  log.  Potential  accidents  are  identified  and  categorized  with  a  risk  class.  The  system  hazard 
analysis  includes  a  functional  analysis,  which  determines  the  hazards  due  to  correct  or  incor¬ 
rect  function  of  the  system,  a  zonal  analysis,  which  considers  the  consequences  of  failures  in 
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adjacent  systems;  and  component  failure  analysis,  which  requires  a  failure  modes  and  effects 
analysis  of  the  components.  The  contractors  must  also  perform  a  system  risk  analysis  which 
associates  the  identified  hazards  and  possible  accident  sequences  with  their  risk  classes.  The 
system  risk  analysis  also  requires  that  a  failure  probability  analysis  be  carried  out  using  fault 
tree  analysis, 

For  all  changes,  the  contractor  is  required  to  carry  out  a  system  change  hazard  analysis  to  de¬ 
termine  that  new  hazards  are  not  being  introduced  by  the  change  or  that  existing  hazards  are 
not  increased  in  risk. 

Any  properties  of  the  system  intended  to  remove  the  intolerable  hazards  and  reduce  other 
risks  are  known  as  safety  features  and  must  be  documented  in  the  specification  and  design 
documents. 

The  standard  uses  the  concept  of  safety  integrity  level  to  classify  functions  according  to  how 
much  effect  the  function  has  on  system  hazards,  Tables  are  presented  that  help  the  contrac¬ 
tors  determine  the  safety  integrity  level  for  functions.  The  concept  is  also  used  to  discuss  high- 
level  functions  and  the  requirements  on  implementation  of  a  high-level  function  with  a  partic¬ 
ular  safety  integrity  level  from  low-level  functions  with  differing  safety  integrity  levels. 

The  standard  also  makes  requirements  on  the  management  of  the  program.  There  is  a  re¬ 
quirement  that  the  program  has  a  project  safety  engineer  with  sufficient  seniority  and  authority 
to  represent  the  development  organization.  The  project  safety  engineer  is  responsible  for  all 
safety  matters  including  signing  statements  of  risk  and  component  criticality.  In  addition,  for 
any  program  with  either  catastrophic  or  critical  hazards,  an  independent  safety  auditor  must 
be  appointed.  The  independent  safety  auditor  must  have  full  access  to  the  results  of  hazard 
analyses  and  safety  risk  assessments.  The  auditor  must  be  independent  of  the  development 
organization  and  should  not  be  changed  during  the  development  of  the  system  without  good 
reason. 

The  standard  requires  that  all  objects  produced  by  hazard  analysis  or  safety  risk  assessment, 
including  relevant  data,  be  held  under  configuration  control  to  satisfy  certain  standards. 

5.1.3  Summary 

These  two  standards  offer  a  very  strong  statement  on  the  development  of  safety-critical  soft¬ 
ware.  Essentially,  they  state  that  until  demonstrated  otherwise,  all  software  is  assumed  to  be 
safety  critical.  Once  software  has  been  analyzed,  it  may  be  assigned  a  safety  integrity  level 
which  determines  the  type  of  effort  required  for  the  development  of  that  software.  Software 
with  a  high  safety  integrity  level  must  be  specified,  designed,  and  implemented  using  appro¬ 
priate  formal  methods  with  formal  proofs  of  correctness  being  presented  between  all  levels  of 
the  software. 
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5.2  MIL-STD-882B 


The  MIL-STD-882B  standard  [54],  developed  by  the  US  Department  of  Defense  in  1 977  and 
updated  in  1984,  requires  that  contractors  establish  and  maintain  a  formal  system  safety  pro¬ 
gram  that  ensures  that;  safety  consistent  with  the  mission  is  designed  into  tho  system;  hazards 
associated  with  the  system  are  identified,  evaluated  and  eliminated  or  the  associated  risk  is 
reduced  to  an  acceptable  level;  uses  historical  data  concerning  failures;  and  records  signifi¬ 
cant  safety  data  for  use  in  other  systems. 

Design  requirements  are  described  which  state  that  the  first  aim  of  the  contractors  is  to  elimi¬ 
nate  hazards  or  at  least  to  reduce  the  risk  associated  with  the  hazard  through  the  design  pro¬ 
cess.  The  designers  are  required  to  isolate  hazardous  substances,  components,  or  operations 
from  the  rest  of  the  system.  The  designers  are  also  required  to  design  software-controlled  or 
software-monitored  functions  to  minimize  the  initiation  of  hazardous  events  or  mishaps. 

A  hierarchy  of  safety  procedures  is  described.  In  decreasing  order  of  importance,  these  pro¬ 
cedures  are:  designed  for  minimum  risk,  incorporate  safety  devices,  provide  warning  devices, 
and  develop  procedures  or  training.  Hazards  are  also  categorized  into  four  levels;  Catastroph¬ 
ic,  critical,  marginal,  and  negligible.  The  standard  states  that  for  hazards  that  are  either  cata¬ 
strophic  or  critical,  unless  a  waiver  is  granted,  a  safer  design  or  safety  devices  must  be  used 
to  reduce  the  risk. 

The  standard  recognizes  that  at  the  start  of  development  quantitative  values  for  the  probability 
of  an  event  occurring  will  not  be  available.  Instead,  qualitative  values  may  be  used,  though  the 
standard  requires  that  a  rationale  for  tha  choice  of  the  probability  level  must  be  given.  The 
qualitative  probability  levels  are  divided  into  five  categories:  frequent,  probable,  occasional,  re¬ 
mote  and  improbable.  The  standard  then  describes  three  task  areas  concerned  with  program 
management  and  control,  design  and  evaluation,  and  software  hazard  analysis. 

The  first  task  area,  program  management  and  control,  includes  a  number  of  tasks.  These 
tasks  require  the  contractor  to  create  a  system  safety  program  plan.  This  plan  describes  the 
tasks  and  activities  of  system  safety  engineering  and  system  safety  management.  It  describes 
the  tasks  of  the  system  safety  organization  and  authority  it  holds.  The  task  area  outlines  the 
task  that  verifies  that  the  safety  organization  has  done  its  job  correctly.  Other  tasks  in  the  man¬ 
agement  area  require  that  the  contractor  participate  in  any  system  safety  groups  that  are 
formed.  The  task  area  also  requires  that  the  contractor  maintain  a  hazard  log  that  documents 
all  hazards  from  the  time  the  hazard  is  identified  to  the  time  when  the  hazard  is  eliminated  or 
the  associated  risk  is  reduced  sufficiently.  Finaliy,  the  standard  also  requires  that  the  contrac¬ 
tor  report  the  status  of  hazards  at  periodic  intervals,  and  defines  levels  of  qualifications  that 
key  safety  engineers  should  hold. 

The  second  task  area  describes  tasks  associated  with  performing  various  analyses  for  safety. 
The  tasks  required  are  those  involved  with  the  preparation  of  a  preliminary  hazard  list  by  per¬ 
forming  a  preliminary  hazard  analysis  which  should  be  started  either  during  concept  explora¬ 
tion  or  as  early  as  possible  during  development.  The  contractors  are  required  to  perform  a 
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subsystem  hazard  analysis  which  should  include  issues  of  safety  relating  to  the  possible 
modes  of  failure  including  those  caused  by  reasonable  human  error  or  single-point  equipment 
failure.  The  contractors  are  also  required  to  perform  system  hazard  analysis,  operation  and 
support  hazard  analysis,  safety  assessments  and  software  hazard  analysis.  This  latter  analy¬ 
sis,  though  required  in  the  design  task  area,  is  the  focus  of  the  third  task  area. 

The  third  task  area  concentrates  on  software  hazard  analysis.  The  standard  states  that  it  is  to 
be  used  in  conjunction  with  other  standards  such  as  DOD-STD-2167  and  MIL-STD-483A.  The 
task  area  requires  that  the  software  be  analyzed  at  the  following  levels:  software  requirements, 
top-level  design,  detailed  design,  and  code  level.  Each  of  these  analyses  should  check  for  in¬ 
put/output  timing,  multiple  event,  out  of  sequence  event,  wrong  event,  failure  of  event,  inap¬ 
propriate  magnitude,  adverse  environment,  deadlock,  or  hardware  failure  sensitivities.  Fur¬ 
ther,  at  the  code  level,  the  contractor  is  required  to  analyze  the  software  for  implementation  of 
the  safety  criteria,  and  for  combinations  of  hardware  or  software  or  transient  errors  that  could 
cause  hazardous  operation.  The  contractor  must  also  perform  flow  analysis  of  the  code,  en¬ 
sure  that  there  is  proper  error  default  handling,  and  that  there  are  fail-safe  or  fail-soft  modes. 
The  contractor  is  also  required  to  perform  software  safety  testing,  a  user  interface  analysis 
and,  for  every  change  to  be  made,  a  software  change  hazard  analysis  to  determine  the  impact 
on  safety  of  a  proposed  change. 

In  summary,  this  standard  describes  many  tasks  that  contractors  must  perform.  It  should  be 
noted  that  the  standard  does  not  describe  specific  techniques  that  must  be  used,  rather  it  out¬ 
lines  the  tasks  and  allows  the  program  manager  to  choose  techniques  or,  if  the  contractor 
chooses  techniques,  requires  that  the  program  manager  approve  of  the  techniques.  The  stan¬ 
dard  takes  a  systems  view  of  safety  and  discusses  the  particular  approaches  to  be  taken  if 
there  is  a  significant  software  portion  to  the  system. 

In  the  rationale,  the  standard  lists  various  techniques  for  ensuring  safety.  These  include  soft¬ 
ware  fault  tree  analysis,  software  sneak  analysis,  code  walk-throughs  and  Petri  net  analysis. 
The  rationale  recognizes  that  these  techniques  have  different  strengths  and  weaknesses  and 
states  that  a  thorough  software  hazard  analysis  will  require  the  application  of  more  than  one 
of  the  techniques. 

5.3  DO-178A&  MOD  00-31 

MOD  00-31  [162],  developed  by  the  MOD  in  1987,  is  a  standard  relating  to  the  development  of 
safety-critical  software  for  airborne  systems.  The  standard  relies  on  DO-178A  [188],  a  stan¬ 
dard  developed  by  the  Radio  Technical  Commission  for  Aeronautics  in  1985,  and  makes  some 
minor  adjustments  to  that  standard.  Given  the  later  publication  of  MOD  00-55  and  MOD  GO- 
56,  it  must  be  assumed  that  this  standard  has  been  surpassed  by  the  stronger  standards 
which  cover  procurement  and  development  of  safety-critical  software  for  MOD  applications 
wherever  the  avionics  system  software  is  judged  safety  critical. 
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The  rest  of  this  section  deals  with  RTCA/DO-178A.  This  standard  describes  techniques  and 
methods  that  may  be  used  to  develop  software  for  airborne  systems.  The  standard  does  not 
specify  requirements:  it  just  offers  guidance  on  techniques  that  help  the  developer  meet  the 
requirements  of  government  regulatory  agencies.  The  standard  includes  a  caveat  that  with  the 
current  state  of  knowledge  of  the  techniques  described  In  the  standard,  the  recommended 
techniques  may  not  be  sufficient  to  ensure  that  safety  and  reliability  targets  are  achieved,  thus, 
other  techniques  may  need  to  be  employed. 

The  standard  offers  guidelines  to  determine  the  level  of  criticality  of  the  software,  techniques 
for  software  development  and  configuration  management,  documentation  guidelines,  and  sys¬ 
tem  design  guidelines. 

The  certification  criteria  for  the  system  depends  on  the  significance  of  the  functions  to  flight 
safety.  Determining  the  criticality  category  is  the  first  step  in  determining  the  certification  re¬ 
quirements.  There  are  three  categories: 

•  Level  1.  Critical,  a  failure  will  prevent  the  safe  operation  of  the  aircraft. 

•  Level  2.  Essential,  a  failure  reduces  the  capability  of  the  aircraft  or  the  ability 
to  handle  adverse  conditions. 

•  Level  3.  Non-essential,  a  failure  does  not  significantly  reduce  the  capability 
of  the  aircraft. 

The  most  critical  function  of  the  system  determines  the  category  of  the  whole  system  unless 
that  system  has  been  partitioned  into  elements  of  different  criticality  categories. 

The  development  section  helps  define  a  certification  plan  and  software  development  activities 
to  obtain  software  at  a  given  criticality  category  or  software  level,  the  latter  being  analogous  to 
criticality  categories.  The  standard  takes  the  view  that  the  regulatory  agencies  are  primarily 
interested  in  the  resulting  level  of  safety  and  not  in  the  way  the  software  was  developed. 

The  description  of  development  techniques  follows  the  "waterfall"  model  and  discusses  the 
documents  to  be  produced,  the  verification  activities,  and  the  assessment  activities  for  each 
phase  of  development.  No  particular  techniques  are  discussed;  however,  outlines  are  given 
which  suggest  techniques  consistent  with  current,  good  software  development  practice. 

The  standard  states  that  the  practices  for  software  configuration  management  are  derived 
from  the  current  practices  for  hardware  configuration  management.  It  is  stated  that  the  soft¬ 
ware  configuration  management  plan  and  the  software  quality  assurance  plan  are  closely  re¬ 
lated  and  that  these  two  plans  may  be  combined.  The  plans  (separately  or  combined)  address 
the  identification,  control,  status,  accounting,  media  control,  and  configuration  audits  of  the  op¬ 
erational  software  and  of  the  hardware  and  software  used  to  support  the  development.  The 
intent  is  to  offer  visibility  of  the  configuration  management  process  to  the  installers  and  regu¬ 
latory  agencies.  The  configuration  management  plan  centers  on  the  use  of  part  numbers  as¬ 
sociated  with  functional  components  of  the  system  thus  defining  replaceable  units.  The  plans 
should  identify  the  disciplines  involved  in  the  development,  production  and  post-certification  of 
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the  project  related  to  configuration  management  or  quality  assurance.  It  is  stated  that  a  prob¬ 
lem  reporting  and  corrective  action  procedure  should  be  introduced  for  Level  1  and  Level  2 
software  and  that  such  a  procedure  is  also  desirable  for  use  with  Level  3  software. 

The  standard  describes  a  series  of  fourteen  documents  that  describe  the  system.  The  respon¬ 
sibility  for  the  creation  of  these  documents  lies  with  the  development  organization.  The  docu¬ 
ments  provide  information  about  the  software;  for  example,  a  programmer’s  manual,  source 
code  in  both  machine  and  human  readable  forms,  and  executable  code  are  included.  Other 
information  in  these  documents  describes  the  development  and  support  environment,  the  soft¬ 
ware  requirements,  the  design  description,  and  management  activities  such  as  the 
configuration  management  plan  or  the  accomplishment  summary,  a  document  used  mainly  for 
certification  purposes. 

5.4  IEC-880 

This  standard  [97],  developed  for  the  international  nuclear  power  industry,  is  primarily  con¬ 
cerned  with  developing  software  for  the  safety  systems  of  nuclear  power  plants.  Essentially,  it 
may  be  considered  to  consist  of  two  parts.  The  main  body  forms  the  requirements  on  devel¬ 
opment  and  indicates  the  particular  requirements,  some  rationale,  and  comments  on  how  the 
requirements  may  be  met.  The  second  part  is  a  series  of  appendices  giving  detailed  require¬ 
ments  to  back  up  the  requirements  offered  In  the  main  body. 

The  standard  mandates  neither  particular  techniques  nor  even  classes  of  techniques.  The 
standard  suggests  that  the  project  should  be  divided  into  a  number  of  self-contained  but  mu¬ 
tually  dependent  phases.  For  any  safety-related  application,  none  of  the  identified  phases  will 
be  omitted.  The  entire  life  cycle  must  be  considered  and  each  phase  of  the  software  life  cycle 
should  be  divided  into  elementary  tasks  with  a  well-defined  activity  for  these  tasks.  Each  prod¬ 
uct  will  be  systematically  checked  after  each  phase.  Each  phase  will  end  with  a  critical  review 
(part  of  the  verification  process  for  the  project). 

The  software  requirements  must  be  derived  from  the  requirements  of  the  safety  systems  and 
describe  the  product,  not  the  project.  They  describe  what  has  to  be  done,  not  how  it  has  to  be 
done.  An  appendix  offers  guidelines  for  the  content  of  the  software  requirements.  This  in¬ 
cludes  a  complete  list  of  system  functions  with  a  detailed  description  that  relates  the  functions 
to  one  another  and  to  system  inputs  and  outputs.  Risk  considerations,  recommendations  for 
functions  or  other  safety  features,  and  other  Items  providing  background  information  on  spe¬ 
cific  requirements  may  be  included  as  they  may  be  background  for  licensing  even  if  unused  in 
development.  The  interfaces  between  the  safety  system  and  any  other  systems  will  be  docu¬ 
mented  to  indicate  the  specific  interlaces  and  related  software  requirements.  The  computer 
software  must  continuously  supervise  both  itself  and  the  hardware.  This  supervision  is  a  pri¬ 
mary  factor  in  achieving  overall  system  reliability.  The  standard  notes  that  the  requirements 
should  be  presented  according  to  a  standard  whose  formality  does  not  preclude  readability. 
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The  requirements  should  be  unequivocal,  testable  or  verifiable,  and  realizable.  The  standard 
suggests  that  using  a  formal  specification  language  may  help  to  show  the  coherence  or  com¬ 
pleteness  of  the  software  requirements. 


The  standard  makes  some  design  recommendations,  including  decomposition  of  the  software 
into  modules,  top-down  rather  than  bottom-up  development  as  well  as  the  avoidance  of  pro¬ 
gramming  tricks,  recursive  structures  and  unnecessary  code  compaction.  No  particular  lan¬ 
guage  is  required,  although  it  is  recommended  that  the  language  should  have  a  thoroughly 
tested  translator  and  be  completely  and  unambiguously  defined.  The  standard  makes  some 
requirements  on  documentation  of  the  design  and  the  program  that  includes  a  software  per¬ 
formance  specification  and  other  documentation  that  may  assist  verification. 

Verification  is  described  as  addressir/:  tne  adequacy  of  the  software  requirements  in  fulfilling 
the  safety  system  requirements  assigned  to  the  computer  system,  of  the  system  software  de¬ 
sign  of  fulfilling  the  requirements,  and  of  the  final  system  source  code  fulfilling  the  software 
performance  specification.  Verification  is  to  take  place  according  to  the  verification  plan.  This 
plan  should  be  sufficiently  detailed  so  that  verification  may  be  performed  by  an  independent 
group.  This  group  should  be  managerial^  distinct  from  the  development  group. 

The  standard  makes  requirements  on  the  Integration  phase  where  software  and  hardware  are 
put  together.  The  procedures  used  to  put  the  pieces  together  depend  on  the  specific  project; 
however,  they  should  be  documented  in  an  integration  plan  and  must  cover  the  acquisition  of 
the  proper  modules,  the  integration  of  hardware  modules,  the  correct  linkage  of  software,  pre¬ 
liminary  tests  of  the  integrated  function,  and  the  formal  release  of  the  integrated  system  to  ver¬ 
ification  testing.  When  the  entire  computer  system  is  tested;  the  standard  recommends  that 
the  tests  cover  all  signal  ranges  and  the  ranges  of  computed  or  calculated  parameters,  cover 
the  voting  and  other  logic  and  logic  combinations,  be  made  for  all  trip  or  protective  signals  in 
the  final  assembly,  and  ensure  that  accuracy  and  response  times  are  confirmed. 

The  standard  requires  that  a  formal  modification  procedure  should  be  established  that  in¬ 
cludes  verification  and  validation.  Requests  to  make  modification  should  include  the  reason 
for  the  request,  the  functional  scope,  the  aim,  and  the  originator.  Any  request  will  be  evaluated 
independently  resulting  in  either  a  rejection,  or  an  approval,  or  a  requirement  for  further  de¬ 
tailed  analysis.  After  a  modification  has  been  made,  verification  and  validation  must  be  per¬ 
formed  according  to  the  analysis  of  the  impact  of  the  modification. 

The  standard  makes  requirements  on  the  operation  of  the  software.  The  requirements  include 
commissioning  tests  and  man-machine  interaction  tests  to  ensure  that  the  operator  cannot  al¬ 
ter  any  of  the  program  logic.  Further,  the  operators  are  required  to  be  trained  in  the  system, 
including  training  on  a  system  equivalent  to  the  actual  system. 
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In  summary,  the  IEC-880  standard  does  not  specify  particular  techniques,  but  rather  discuss¬ 
es  the  type  of  work  that  must  be  performed  in  the  development  of  software  for  the  shutdown 
system  of  a  nuclear  power  plant.  This  standard  is  unusual  in  the  field  of  standards  as  it  makes 
requirements  about  operator  training  as  well  as  requirements  on  the  software  and  develop¬ 
ment  process. 

5.5  SafelT 

The  SafelT  documents  ([17],  [18]),  developed  by  the  UK  Department  of  Trade  and  Industry, 
while  not  a  standard  in  themselves,  describe  an  approach  to  standards  that  is  technically  in¬ 
teresting.  SafelT  is  described  in  two  volumes;  Volume  1  [17]  describes  the  rationale  behind 
the  work  described  in  Volume  2  [18]. 

The  aims  of  the  program  are  to  assist  in  the  development  of  technically  sound,  feasible,  ge¬ 
neric  international  standards  with  appropriate  domain  specific  standards  that  are  consistent 
with  the  generic  standards.  Secondary  alms  are  to  encourage  the  use  of  software  in  safety  re¬ 
lated  applications  and  to  encourage  the  adoption  of  best  development  practices  in  relation  to 
the  software.  Related  to  this  is  the  aim  of  ensuring  that  use  of  software  enhances  rather  than 
decreases  system  safety. 

The  rationale  discusses  the  fact  that  there  are  already  a  number  of  domain  specific  standards 
for  the  development  of  safety-related  systems.  One  of  the  problems  is  that  the  domain  specific 
standards  are  not  written  in  any  coherent  way,  making  It  difficult  to  translate  solutions  from  one 
domain  to  another.  To  achieve  the  aims,  Volume  1 1dentifies  four  key  areas  of  activity  that  re¬ 
quire  a  coordinated  approach:  standards  and  certification,  research  and  development,  tech¬ 
nology  transfer,  and  education  and  training.  One  of  the  activities  in  standards  and  certification 
is  the  development  of  the  standards  framework,  described  in  Volume  2. 

The  second  volume  details  work  that  has  already  taken  place  towards  the  development  of 
standards.  The  volume  is  presented  in  two  parts,  the  first  proposing  a  framework  for  safety  re¬ 
lated  standards  and  the  second  discussing  methods  for  standards  development. 

The  objectives  In  developing  the  standards  framework  were  to  develop  common  concepts  and 
terminology  (e.g.,  concepts  such  as  integrity  and  risk),  to  develop  a  set  of  agreed-upon  princi¬ 
ples,  to  develop  an  agreed-upon  set  of  safety  objectives  for  the  assurance  of  integrity  in  soft¬ 
ware  systems  (the  safety  objectives  should  be  common  to  all  applications  and  levels  of  safety), 
to  provide  information  about  technical  and  process  oriented  techniques,  to  develop  a  method 
that  can  be  employed  by  standards  groups  so  that  they  can  systematically  develop  standards 
that  meet  the  safety  objectives,  to  give  examples  of  requirements  for  each  level  of  safety,  and 
to  allow  existing  and  proposed  standards  to  be  incorporated  into  the  framework. 

The  structure  of  the  framework  will  be  based  on  core  standards  that  relate  to  the  framework. 
Surrounding  these  core  standards  will  be  generic  and  domain-specific  standards.  Auxiliary 
standards  that  define  activities  or  techniques  or  methods  will  support  the  generic  and  domain- 
specific  standards. 
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Auxiliary  standards  defining  particular  activities,  techniques,  or  methods  can  be  separated  out 
and  developed  to  support  most  domains.  SafelT  has  already  identified  candidate  activities  for 
auxiliary  standards  such  as  quality  assurance,  configuration  management,  programming  lan¬ 
guages,  hazard  analysis  techniques,  and  others.  The  use  of  auxiliary  standards  means  that 
the  framework  developers  should  be  able  to  take  advantage  of  existing  standards  work.  The 
fundamental  concern  when  developing  an  auxiliary  standard  is  to  consider  whether  it  address¬ 
es  an  aspect  of  software  development  that  can  be  separated  usefully.  Examples  are  topics  not 
specific  to  safety  such  as  security  or  reliability,  or  a  self-contained  technique  or  method. 

The  core  standards  in  the  proposed  framework  will  include  standards  describing  common 
safety  principles,  definitions  of  common  terms  and  concepts  and  a  common  standards  devel¬ 
opment  method.  The  method  Is  necessary  if  there  Is  to  be  a  possibility  of  harmonizing  the  var¬ 
ious  standards.  The  second  part  of  Volume  2  describes  tho  core  standards  in  greater  detail. 

A  number  of  common  principles  are  outlined  These  include  principles  such  as  safety  being  a 
system  rather  than  a  software  concern,  that  safety  must  be  built  into  a  system  rather  than  add¬ 
ed  on,  and  that  the  acceptable  level  of  safety  Is  a  balance  between  the  risks,  benefits  and 
costs. 

The  discussion  of  terms  and  concepts  centers  around  the  fact  that  there  are  already  a  number 
of  standards  activities  with  differing  views  on  many  fundamental  concepts  such  as  the  number 
of  levels  of  safety  within  a  system.  For  the  framework  approach  to  work,  it  must  describe  the 
most  general  set  of  concepts  onto  which  the  concepts  employed  in  existing  standards  may  be 
mapped. 

The  life-cycle  concept  is  important  since  it  is  in  the  context  of  a  life-cycle  that  terms  and  con¬ 
cepts  have  meaning.  The  framework  proposes  to  focus  on  three  different  types  of  life-cycles: 
safety,  procurement,  and  development.  The  approach  taken  has  been  to  consider  each  life- 
cycle  as  a  group  of  sufficiently  large  and  general  phases  into  which  real  models  can  be  fitted. 
The  document  makes  it  clear  that  although  specific  proposed  standards  have  been  adopted 
for  the  framework,  that  this  adoption  is  not  intended  to  limit  discussion  or  use  of  other  life-cy¬ 
cles,  but  rather  as  a  focus  for  discussion, 

Certain  roles  across  all  of  the  standards  are  also  expected  to  be  Identified  and  standardized. 
These  roles  include  procurers,  developers,  users,  and  others.  Terms  that  are  used  to  describe 
the  framework  itself  are  also  expected  to  be  standardized. 

A  standard  contains  requirements  on  the  process  or  product.  The  method  proposed  for  devel¬ 
oping  standards  discusses  what  factors  should  be  taken  into  consideration  for  the  develop¬ 
ment  of  standards.  These  factors  start  with  the  definition  of  the  overall  objectives  of  the 
standard.  These  overall  objectives  are  refined  Into  a  set  of  more  detailed  objectives  and  a 
range  of  techniques  that  can  meet  these  objectives  is  outlined.  Integrated  sets  of  techniques 
are  selected  and  the  rationale  for  the  selection  Is  documented.  Finally,  the  standard  is  pro¬ 
duced  describing  each  the  objectives,  the  techniques  expected  to  satisfy  the  objectives  and 
the  rationale  for  the  choice  of  techniques.  It  Is  noted  that  some  techniques  exclude  others 
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while  others  fit  particularly  well.  Standards  writers  are  warned  to  be  aware  of  the  problems  of 
combining  techniques.  Additionally,  they  are  advised  that  schemes  should  be  developed  that 
reduce  the  possibility  of  the  users  of  the  standards  selecting  inappropriate  combinations.  The 
method  outlined  above  is  then  used  to  further  define  the  framework  standard. 

In  summary,  the  SafelT  approach  is  the  development  of  a  framework  Into  which  particular 
standards  may  be  incorporated.  The  framework  includes  a  method  for  developing  standards 
that  will  fit  into  SafelT  and  will,  the  authors  claim,  lead  to  clearer  standards  for  the  developers 
of  safety-critical  software.  The  SafelT  approach  is  interesting  as  it  proposes  an  approach  that 
will  allow  for  the  potential  unification  of  the  conflicting  and  competing  maze  of  international 
standards. 

5.6  Effects  of  Standards 

Standards  may  have  a  number  of  positive  effects  including  the  provision  for  a  common  archi¬ 
tecture,  a  common  vocabulary,  and  a  statement  of  a  minimal  level  of  compliance  from  the 
community.  They  may  also,  however,  have  some  negative  effects.  These  effects  are  dis¬ 
cussed  below. 

5.6.1  Standard  Is  Inappropriate 

A  standard  may  be  inappropriate  for  a  number  of  reasons.  The  most  likely  reasons  are  that 
either  the  standard  Is  outdated  or  the  technology  defined  in  the  standard  may  not  be  readily 
applicable. 

A  standard  may  be  out  of  date  because  it  takes  a  long  time  to  create  or  revise  standards.  As 
the  process  proceeds  the  standard  takes  on  an  Inertia  with  respect  to  change  and  becomes 
more  resistant  to  changes  based  on  current  technologies.  Once  released  and  accepted  by  the 
community,  there  is  community  resistance  to  changes  in  the  standard  because  the  community 
may  have  to  change  their  practices  to  be  compliant  with  the  new  standard,  Thus,  standards 
have  long  life  spans  and,  In  a  rapidly  changing  technological  area  such  as  software  engineer¬ 
ing,  are  often  out  of  date  (sometimes  even  before  the  standard  is  released).  To  prolong  the 
useful  life  span  of  a  standard,  the  standards  developers  may  attempt  to  standardize  on  a  tech¬ 
nology  that  Is  only  just  coming  out  of  the  research  community,  Generally,  the  standardization 
committee  has  a  belief  in  the  value  of  such  research  technology  and  may  even  perform  a  num¬ 
ber  of  experiments  to  convince  Itself  of  the  value  of  the  technology.  Unfortunately,  the  technol¬ 
ogies  may  not  be  proven  to  operate  on  large-scale  systems  or  in  a  domain  as  wide  as  the 
domain  to  which  the  standard  applies.  Further,  the  technology  may  simply  be  inappropriate  for 
some  systems  for  which  the  standard  requites  its  use. 

5.6.2  Standard  Is  Ineffective 

If  the  purpose  of  a  standard  Is  to  bring  the  community  to  an  acceptable  level  of  quality,  it  is 
important  that  the  level  of  quality  defined  by  the  standard  is  acceptable  to  the  system  procur¬ 
ers. 
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The  contents  of  a  standard  depend  in  large  part  on  the  members  of  the  committee  given  the 
task  of  the  creation  of  the  standard.  If  the  committee  is  made  up  largely  of  developers  then  it 
Is  in  the  best  interests  of  the  committee  members  to  make  the  standard  as  weak  as  possible 
so  that  they  will  not  have  to  change  their  current  practices  to  conform  to  the  standard.  If,  on 
the  other  hand,  the  committee  is  made  up  largely  of  procurers,  the  standard  may  indeed  have 
sufficient  strength  to  bring  the  development  practices  of  the  community  up  to  the  desired  level. 
Unfortunately,  there  are  many  standards  which  have  been  constructed  by  the  development 
rather  than  procurement  communities.  This  practice  has  led  to  a  large  number  of  ineffective 
standards. 

5.6.3  Standard  Induces  Minimal  Compliance 

Standards  describe  a  minimum  level  of  compliance,  and  the  minimum  practices  and  technol¬ 
ogies  that  developers  must  employ  for  the  development  product  to  be  acceptable.  Unfortu¬ 
nately,  there  is  a  distinct  possibility  that  a  developer  may  look  at  the  standard  as  the  maximum 
level  of  quality  to  which  they  need  to  aspire.  After  all,  once  the  standardized  practices  and 
technologies  have  been  employed,  the  developers  have  met  the  criteria.  Why  do  more,  espe¬ 
cially  if  doing  more  than  required  by  the  standard  costs  money?  Thus,  the  standard  may  in¬ 
duce  minimum  levels  of  compliance  rather  than  being  the  hoped-for  minimum. 

Related  to  the  issue  of  minimum  compliance  is  an  Interesting  issue  of  liability.  The  question, 
currently  unresolved,  is  “Who  is  liable  if  a  product  developed  according  to  a  standard  fails?" 
The  developers  will  argue  that  they  met  the  standard  and  thus  were  not  negligent  in  the  devel¬ 
opment  of  the  product. 
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6  Conclusions 


Finally,  we  present  the  conclusions  drawn  through  the  creation  of  the  annotations  in  the  bibli¬ 
ography  and  the  writing  of  this  report.  We  then  describe  potential  future  work  in  the  develop¬ 
ment  of  safety-critical  systems. 

6.1  Conclusions 

The  creation  of  software  to  be  included  in  safety-critical  systems  Is  a  hard  task.  Certainly  it  re¬ 
quires  more  care  and  thought  than  the  development  of  software  in  other  systems.  Whenever 
developers  are  creating  software  that  may,  within  the  context  of  a  system,  threaten  life  or  prop¬ 
erty,  they  must  use  the  most  careful  approaches  possible.  In  many  cases,  it  will  not  be  possible 
to  adequately  test  the  software  in  an  operational  situation  because  these  cases  involve  sys¬ 
tems  that  must  not  be  allowed  to  fall— weapon  systems  are  one  such  class  of  systems. 

Safety  is  not  an  attribute  that  can  be  added  to  software  after  the  event;  it  must  be  designed 
into  the  software  from  the  start,  and  it  must  be  constantly  checked  to  ensure  that  unexpected, 
unsafe,  functions  have  not  been  added  or  necessary  functions  have  not  been  removed.  Thus, 
development  of  safety-critical  software  depends  on  appropriate  system  requirements  engi¬ 
neering,  system  hazards  identification  and  system  design  and  software  requirements  engi¬ 
neering,  design  and  development. 

System  engineering  Is  particularly  Important  because  we  still  have  an  imperfect  understanding 
of  the  ways  in  which  software  failures  can  affect  the  system.  It  is  important,  wherever  possible, 
to  offer  alternative  backups  to  the  safety-critical  software  that  allow  the  system  operators  to 
perform  degraded,  yet  safe,  operation  of  the  system. 

There  is  no  substitute  for  high-quality  developers,  particularly  when  determining  the  ways  in 
which  the  system  may  fail  and  thus  lead  to  potential  mishaps. 

Software  fault  tree  analysis  is  a  promising  approach  to  ensuring  that  the  software  wr  not  had 
to  a  mishap;  however,  it  relies  on  knowing  in  advance  what  the  possible  failures  of  the  system 
could  be.  The  use  of  fault  tree  analysis  for  analysis  of  the  design  and  the  specification  is  a  valu¬ 
able  step  towards  ensuring  that  the  system  will  Lie  safe  early  rather  than  iate  in  the  develop¬ 
ment  process.  Unless  there  is  Improvement  in  the  determination  of  conformance  of  an 
implementation  to  e  design  or  specification,  however,  the  use  of  fault  tree  analysis  at  the  spec¬ 
ification  or  design  levels  will  not  replace  the  use  of  safety  analysis  at  the  Implementation  level. 

Formal  methods  are  discussed  throughout  the  literature  as  a  potential  solution  to  the  issue  of 
ensuring  conforming  implementations.  However,  these  are  not  a  complete  solution.  While  the 
use  of  formal  methods  helps  reduce  system  faults  by  riot  inserting  errors  into  the  implementa¬ 
tion,  they  do  not  help  with  issues  of  random,  low-probability  faults  such  as  a  component  failure 
in  the  computer  or  sensors.  These  faults  need  to  be  masked  using  fault-tolerant  techniques 
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both  in  the  system  hardware  and  software.  Further,  formal  methods,  by  themselves,  do  not 
remove  errors  introduced  at  the  level  of  the  system  requirements. 

A  requirements  (or  specification)  error  may  lead  to  implementations  that  conform  to  the  written 
description  of  the  system  but  still  result  in  an  unsafe  system.  Thus  the  requirements  and  spec¬ 
ifications  must  be  analyzed  to  ensure  that  an  acceptable  level  of  risk  has  been  achieved.  Of 
particular  importance  are  the  tradeoffs  between  safety  and  other  quality  factors  such  as  per¬ 
formance  or  security. 

Standards  can  help  in  the  development  of  safety-critical  systems  in  that  they  state  guidelines 
by  which  the  system  should  be  developed.  However,  they  are  not  a  panacea  as  they  may  be 
out  of  date,  inapplicable  to  the  particular  domain,  or  induce  only  minimal  compliance.  It  should 
be  stressed  that  standards  describe  the  minimum  effort  required  to  engineer  safe  systems  and 
that  developers  should  be  strongly  encouraged  to  exceed  the  standards. 

6.2  Further  Work 

This  report  has  been  an  attempt  to  capture  in  written  form  the  Information  obtained  from  read¬ 
ing  the  literature  on  software  safety.  There  aro  many  directions  which  may  be  pursued  from 
this  point. 

1 .  There  are  various  classes  of  safety-critical  systems  in  operation.  Examples  of 
different  classes  are  nuclear  reactor  shutdown  systems  which,  after  detecting 
a  hazardous  state,  perform  hazard  recovery  by  shutting  down  the  reactor  and 
then  need  not  operate  again  and  an  avionics  flight  control  system  which,  after 
detecting  a  hazardous  state  must  avoid  the  hazard  and  then  continue  to  op¬ 
erate  and  control  the  aircraft.  These  different  classes  may  work  best  with  dif¬ 
ferent  architectural  designs. 

A  valuable  contribution  to  the  development  of  safety-critical  software  would 
be  a  classification  of  criticalities  with  indications  of  architectures  that  have 
been  accepted  as  safe  in  the  different  classifications.  Then,  for  any  new  sys¬ 
tem,  the  developers  could  classify  their  system  and  use  that  classification  to 
suggest  a  requirements  model  and  an  architectural  structure  for  the  system. 

2.  Convincing  examples  of  the  application  of  the  techniques  described  in  this 
report  are  needed.  The  Inertia  found  in  development  organizations  is  such 
that  even  though  the  problem  of  developing  safe  software  will  not  go  away, 
developers  still  noed  to  be  convinced  that  more  formal  techniques  will  help 
solve  the  development  problem.  A  number  of  techniques  have  been 
developed  and  tested  by  academics  on  small  examples  (there  have  also 
been  some  publications  concerning  significant  systems  such  as  the 
Darlington  nuclear  reactor  [177]).  Examples  of  significant  application  of  the 
techniques  described  in  the  paper  aie  one  way  to  convince  developers, 

3.  There  are  currently  many  standards  pertaining  to  software  safety  with  varying 
levels  of  safety  induced  through  the  use  of  the  standards.  One  problem  is  that 
many  of  the  existing  standards  are  out  of  date.  One  of  the  more  interesting 
approaches  to  standards  is  the  notion  of  a  standards  framework  into  which 
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standards  specific  to  a  domain  may  be  inserted.  Consistent  with  this  view  is 
the  notion  of  a  meta-standard  which  describes  the  nature  of  the  techniques 
that  are  to  be  employed  and  allows  the  developers  to  Instantiate  the  standard 
with  techniques  currently  being  used  by  the  developers.  Such  an  approach 
means  that  the  standards  body  do  not  standardize  out-of-date  or  untested 
technology.  The  disadvantage  is  that  the  people  responsible  for  deployment 
of  the  system  must  first  approve  the  development  approach  recommended  by 
the  system  developers.  This  requires  new  skills  for  those  responsible  for 
system  deployment. 

4.  There  appear  to  be  two  almost  competing  camps  in  the  development  of 
safety-critical  software.  The  formalists  who  suggest  that  errors  (and  therefore 
hazards)  can  be  eliminated  by  using  formal  methods;  the  safety-analysts 
examine  the  system  artifacts  after  construction  for  potentially  hazardous 
behaviors.  It  seems  reasonable  that  a  combination  of  the  two  approaches  will 
lead  to  the  development  of  the  safest  software  and  that  the  interactions 
between  these  two  approaches  should  be  investigated  further. 
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dissertation,  ICS  Dept.,  University  of  California,  Irvine,  1991. 

[39]  P.C.  Clements.  Engineering  more  secure  software  systems.  In  COMPASS  '87 
Computer  Assurance,  pages  79-81,  Washington,  D.C.,  July  1987. 

Considers  safety,  security,  and  other  requirements  as  special  cases  of  a  gen¬ 
eral  desire  to  ensure  that  a  particular  hardware  and  software  system  behaves 
as  expected.  In  order  to  ensure  that  this  is  the  case,  the  first  step  is  to  write 
down  what  is  expected  of  the  system.  The  A7-E  style  of  writing  specifications 
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of  requirements  is  briefly  discussed  with  some  arguments  that  indicate  it  is  pos¬ 
sible  to  be  confident  that  an  implementation  satisfies  the  specification, 

[40]  U.S.  Atomic  Energy  Commission.  Reactor  Safety  Study:  An  Assessment  of 
Accident  Risks  in  the  U.S.  Commercial  Nuclear  Power  Plants.  Report  WASH- 
1400,  1975. 

[41]  Committee  on  Science,  Space,  and  Technology,  U.S.  House  of 
Representatives.  Bugs  in  the  Program— Problems  in  Federal  Government 
Computer  Software  Development  and  Regulation— Staff  Study,  Technical 
report,  U.S.  Government  Printing  Office,  Washington,  D.C.,  September  1 989. 

[42]  B.  Connolly.  Software  Safety  Goal  Verification  using  Fault  Tree  Techniques.  In 
COMPASS  '89:  IEEE  Fourth  Annual  Conference  on  Computer  Assurance, 
pages  18-21,  Gaithersburg,  MD,  1989. 

[43]  B.  Connolly.  Software  Safety  Goal  Verification  using  Fault  Tree  Techniques:  A 
Critically  III  Patient  Monitor  Example.  In  Second  Annual  IEEE  Symposium  on 
Computer-Based  Medical  Systems,  pages  118-120,  Minneapolis,  MN,  June 
1989. 

[44]  S.D.  Crocker.  Techniques  for  Assuring  Safety — Lessons  from  Computer 
Security.  In  COMPASS  '87  Computer  Assurance,  pages  67-69,  Washington, 
D.C.,  July  1987. 

The  paper  uses  the  history  of  the  development  of  security-critical  systems  as  a 
predictor  of  the  future  development  of  safety-critical  software.  The  potentially 
overlapping  phases  are  described  as  heightened  visibility  for  such  systems,  in¬ 
troduction  of  new  methodologies,  increased  availability  of  hardware  support, 
use  of  formal  specification ,  and  introduction  of  formal  tools.  The  paper  makes 
the  point  that  a  key  to  building  safer  systems  Is  an  Increase  of  effort  on  require¬ 
ments  for  safety.  The  paper  concludes  that  any  and  all  of  the  approaches  are 
necessary  to  ensure  increased  safety. 

[45]  W.J.  Cullyer  and  W.  Wong,  A  Formal  Approach  to  Railway  Signalling.  In 
COMPASS  90:  Computer  Assurance,  Gaithersburg,  MD,  July  1990. 

[46]  N.C.  Dalkey.  The  Delphi  Method.  An  Experimental  Study  of  Group  Opinion.  RM 
58-88  PR,  The  Rand  Corporation,  1969. 

[47]  B.K.  Daniels,  editor.  Achieving  Safety  and  Reliability  with  Computer  Systems. 
Elsevier  Applied  Scientific,  November  1987. 

The  proceedings  of  a  conference  held  in  1987  by  the  Safety  and  Reliability  So¬ 
ciety.  The  structure  of  the  book  follows  that  of  the  symposium.  Identified  trends 
are  the  increased  use  of  formal  methods  in  industry  with  Increased  tool  support. 
Complementing  formal  methods  is  the  continuing  best  use  of  accumulated  ex¬ 
perience  with  software  engineering  and  safety  and  reliability  assessment.  Pa¬ 
pers  of  relevance  are  annotated  separately. 
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B.K.  Daniels,  R.  Bell,  and  R.l.  Wright.  Safety  Integrity  Assessment  of 
Programmable  Electronic  Systems.  In  Proc.  SAFECOMP  '83,  pages  1-12, 
1983. 

H.T.  Daughtrey,  Experiences  in  Conducting  Independent  Verification  and 
Validation  of  Safety  Parameter  Display  System  Software.  In  American  Nuclear 
Society  International  Topical  Meeting  on  Computer  Applications  for  Nuclear 
Power  Plant  Operation  and  Control,  pages  267-273,  Richland,  WA,  September 
1985. 

R.  De  Santo.  A  Methodology  for  Analyzing  Avionics  Software  Safety.  In 
COMPASS  38  Computer  Assurance,  pages  113-118,  Gaithersburg,  MD,  July 

1988. 

Describes  an  approach  to  a  method  that  helps  Identify  safety-critical  software 
functions  and  helps  Isolate  the  safety  critical  paths.  The  method  is  driven  by 
documents  available  If  the  development  is  using  the  2167-A  standard .  The 
method  helps  the  analyst  gain  a  greater  understanding  of  the  system  by  lead¬ 
ing  the  analyst  through  the  system  in  a  stmctured  manner.  Safety  critical  hard¬ 
ware  signals  are  used  as  the  primary  source  for  identifying  the  operationally  re¬ 
lated  safety-critical  software  function. 

E.S,  Dean  Jr.  Software  System  Safety.  In  Proc.  Fifth  International  System 
Safety  Conference,  1 981 . 

D.E.  Denning.  Secure  Databases  and  Safety.  In  T.  Anderson,  editor,  Safe  and 
Secure  Computing  Systems,  pages  101-111,  Blackwell  Scientific  Publications, 

1989. 

The  paper  discusses  the  four  categories  of  security  requirements  on  database 
systems:  authorization,  data  consistency,  availability,  and  Identification,  au¬ 
thentication,  and  audit.  The  applicability  of  the  security  policies  to  system  safety 
is  shown  in  each  category.  The  paper  concludes  by  showing  how  database  se¬ 
curity  policies  assist  in  the  generation  of  a  safe  system.  However,  the  paper 
also  point  out  ways  in  which  the  security  policy  may  conflict  with  the  safety  pol¬ 
icy  and  tentatively  suggests  ways  in  which  these  conflicts  may  be  avoided. 

Department  of  Defense.  Military  Standard  1629 A:  Procedures  for  Performing  a 
Failure  Mode,  Effect  and  Criticality  Analysis.  Department  of  Defense,  1984. 

This  standard  describes  failure  modes,  effects  and  criticality  analysis  and  the 
circumstances  under  which  such  analysis  should  be  applied.  The  standard 
does  not  apply  to  software,  but  to  hardware  to  be  acquired  by  the  DoD. 

Department  of  Defense.  Military  Standard  882B:  System  Safety  Program 
Requirements.  Department  of  Defense,  1984, 

This  standard  outlines  the  tasks  that  must  be  described  in  the  program  contract 
to  satisfy  DoD  regulations  on  system  safety.  The  task  areas  are  divided  into 
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three  areas:  those  associated  with  management,  those  with  system  design, 
and  those  with  software.  For  each  area,  the  tasks  are  described  in  terms  of 
what  the  task  must  achieve.  The  standard  does  not  require  specific  techniques, 
but  requires  that  the  program  manager  either  choose  technioues  or  determine 
whether  contractor  proposed  techniques  will  lead  to  an  acceptable  level  of  sys¬ 
tem  safety. 

[55]  M.S.  Deutsch  and  R.R.  Willis.  Software  Quality  Engineering:  A  Total  Technical 
and  Management  Approach.  Prentice  Hall,  1988. 

The  bock  covers  managerial  aspects  of  inducing  quality  into  software  in  greater 
detail  than  it  covers  technical  aspects.  Safety  is  considered  as  one  of  the  twen¬ 
ty  seven  criteria  that  make  up  quality  and  a  few  suggestions  are  made  which 
are  claimed  to  help  with  safety  management.  Some  techniques  such  as  soft¬ 
ware  fault  tree  analysis,  reliability  modeling,  multi-version  software,  and  cor¬ 
rectness  proofs  are  outlined  as  approaches  for  achieving  exceptional  quality 

[56]  J.H.  Dobbins,  Software  Safety  Management.  In  COMPASS  '88  Computer 
Assurance,  pages  108-112,  Gaithersburg,  MD,  July  1988. 

The  paper  focuses  on  an  approach  for  life  cycle  management  of  software  safe¬ 
ty  continuing  into  operational  phases.  Discusses  the  current  government  prac¬ 
tice  of  writing  requirements  in  prose  and  describes  it  as  the  most  error  prone 
way  to  describe  requirements.  There  is  discussion  of  other  approaches  to  de¬ 
scribing  requirements,  such  as  PSUPSA  and  data  flow  diagrams.  Fagan  in¬ 
spections  of  the  design  and  code  are  recommended  as  a  way  of  reducing  70% 
of  defects  prior  to  unit  testing.  Automated  support  for  analyzing  code  is  also  dis¬ 
cussed.  The  use  of  call  path  analysis  Is  discussed,  any  path  which  includes  a 
safety-critical  module  or  an  overly  complex  module  is  marked  for  exhaustive 
analysis  and  stress  testing.  The  results  of  the  call  path  analysis  are  used  for 
determining  tests  that  ensure  100%  coverage  of  the  system.  The  paper  con¬ 
cludes  with  remarks  that  safety  management  must  be  carried  out  throughout 
the  development  process  and  on  Into  the  maintenance  phase. 

[57]  E.L.  Duke.  V  &  V  of  Flight  and  Mission-Critical  Software.  IEEE  Software, 
6(3):39-45,  May  1989. 

Discusses  a  verification  and  validation  method  used  at  NASA  Ames-Dryden. 
Analysis  and  testing  are  performed  on  abstract  models  of  the  system.  The  mod¬ 
els  include  linear-system  models,  aggregate-system  models,  block  diagrams, 
schematics,  specifications,  and  simulations.  They  prototype  flight  software 
which  Is  evaluated  by  pilots  and  engineers;  the  prototype  is  then  used  as  the 
basis  fora  specification  from  which  the  actual  flight  software  is  produced.  Test¬ 
ing  takes  place  by  providing  the  real  software  identical  input  to  the  prototype 
and  comparing  results.  There  is  brief  discussion  of  limitations  of  the  Ames-Dry- 
den  approach  and  that  formal  proof  and  statistical  analysis  address  the  chief 
limitations.  They  have  three  levels  of  criticality,  from  failure  causing  loss  of  life 
or  limbs  or  damage  to  public  safety  to  systems  whose  failure  may  produce  in¬ 
accurate  results  or  inefficient  use  of  resources.  More  effort  is  placed  into  the 
higher  levels  of  criticality. 
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J.R.  Dunham.  Measuring  software  safety.  In  COMPCON  ‘84:  The  Small 
Computer  (R)Evolution,  pages  192-193,  Arlington,  VA,  September  1984. 

Presents  an  approach  to  measuring  software  safety  using  repetitive  run  analy¬ 
sis.  The  paper  accepts  that  this  is  one  form  of  testing  to  measure  safety  and 
that  other  approaches  exist.  It  also  makes  the  distinction  between  measuring 
reliability  and  safety;  the  latter  must  not  only  estimate  the  frequency  of  errors, 
but  also  their  severity. 

J.R.  Dunham.  V  &  V  in  the  Next  Decade.  IEEE  Software ,  6(3):47-53,  May  1 989. 

Discusses  some  factors  affecting  verification  and  validation  such  as  age  of  the 
software,  reuse,  and  criticality.  Predicts  that  in  the  next  decade  V&V  technology 
wili  be  mature,  covering  all  phases  of  the  life  cycle,  that  It  will  be  included  in  soft¬ 
ware  development  environments,  that  it  will  rely  on  formal  verification  and  sta¬ 
tistical  quality  control,  and  that  it  will  have  guidelines  that  help  select  and  com¬ 
bine  techniques.  Mentions  some  uses  of  formal  verification  techniques. 

M,  Dunn  and  W.  Hillison.  The  Delphi  Technique.  In  Cost  and  Management, 
pages  32-36. 1980. 

L.G.  Egan.  Analysis  of  the  Certification  Process  of  Computer  Programs  Used 
in  a  Nuclear  Power  Plant,  Using  the  Management  Systems  Approach, 
Technical  report,  Software  Certification  Institute,  Santa  Maria,  Ca.,  Year 
unknown. 

W.D.  Ehrenberger.  Fall-Safe  Software— Some  Principles  and  a  Case  Study.  In 

B. K.  Daniels,  editor,  Achieving  Safety  and  Reliability  with  Computer  Systems, 
pages  76-88,  September  1987. 

The  paper  argues  that  one  way  to  achieve  safety  is  by  having  the  software  fol¬ 
low  previously  executed  paths  and,  whenever  a  new  path  is  discovered,  take 
some  system-specific  safety  action.  The  information  on  previously  executed 
paths  is  generated  during  testing  and  may  either  be  control-flow  oriented  or 
data  flow  oriented.  Control-flow  monitoring  Is  achieved  by  building  a  tree  of  ba¬ 
sic  block  entries  during  testing  such  that  each  path  from  root  to  leaf  is  a  trace 
of  an  execution.  Data-flow  monitoring  performs  a  similar  task,  but  uses  array 
addressing  points  rather  than  basic  block  as  the  data  from  which  the  tree  is 
built.  During  execution,  the  trace  may  be  compared  against  the  tree  for  validity. 
Limitations  of  the  approach  are  that  test  data  must  be  similar  to  operational 
data,  that  timing  problems  and  numerical  calculation  errors  are  not  handled,  an 
that  there  is  considerable  execution  overhead,  both  In  space  and  time  require¬ 
ments. 

C. A.  Ericson  Jr.  Software  and  System  Safety.  In  Proc.  Fifth  International 
System  Safety  Conference,  vol.  1,  part  1,  pages  lll-B-1  to  III  -B-11,  Denver, 
1981. 

EWICS  TC  7.  Guidelines  for  the  Maintenance  and  Modification  of  Safety- 
Related  Computer  Systems,  EWICS,  November  1987. 
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[65]  E  WICS  TC  7.  Safety  Assessment  and  Design  of  Industrial  Computer  Systems: 
Techniques  Directory.  EWICS,  November  1987. 

[66]  Food  and  Drug  Administration.  Reviewer  Guidance  for  Computer-Controlled 
Medical  Devices  (Draft).  Food  and  Drug  Administration,  July  1 988. 

[67]  H.H.  Frey.  Safety  and  Reliability— Their  Terms  and  Models  of  Complex 
Systems.  In  Proa.  SAFECOMP  79,  pages  3-10, 1979. 

[68]  A.W.  Friend.  An  Introduction  to  Software  Safety.  In  Seventh  Annual 
Conference  of  the  IEEE  Engineering  In  Medicine  and  Biology  Society,  pages 
1232-1237,  Chicago,  III.,  1985. 

[69]  R.C.  Fries  and  R.T.  Riddle.  A  Software  Quality  Assurance  Procedure  to  Assure 
a  Reliable  Software  Device,  In  Second  Annual  IEEE  Symposium  on  Computer 
Based  Medical  Systems,  pages  135-138,  Minneapolis,  MN,  Juno  1989. 

[70]  P.  Froome  and  B.  Monahan.  The  Role  of  Mathematically  Formal  Methods  In  the 
Development  and  Assessment  of  Safety  Critical  Systems.  Microprocessors 
and  Microsystems,  12(10):539-546,  December  1988. 

[71]  R.U.  Fujli.  Software  Safety  Analysis  Is  an  Integral  Part  of  Systems  Engineering, 
Not  a  Separate  Adjunct.  In  COMPASS  '87  Computer  Assurance,  page  73, 
Washington,  D.C.,  July  1987. 

Argues  that  software  safety  analysis  must  be  part  of  the  system  engineering 
process.  The  most  Important  factor  being  that  during  concept  and  design  for¬ 
mulation  tradeoffs  between  performance  and  safety  must  be  considered  to 
achieve  optimal  system  features. 

[72]  K.  Geary.  Beyond  Good  Practices— A  Standard  for  Safety  Critical  Software.  In 
B.K.  Daniels,  editor,  Achieving  Safety  and  Reliability  with  Computer  Systems, 
pages  232-241,  September  1987. 

Describes  the  changes  to  the  UK  Naval  Engineering  Standard  620  (NES  620) 
made  for  safety-critical  software.  The  paper  Is  interesting  in  that  It  provides  rea¬ 
soning  similar  to  that  behind  the  development  of  MOD  00-55  and  MOD  00-56. 
Indeed,  the  author  argues  that  the  changes  written  into  the  standard  are  suffi¬ 
ciently  similar  to  those  of  the  MOD  standards  and  that  NES  620  should  be  su¬ 
perseded  by  the  MOD  standards. 

[73]  G.  Gioe  and  O.  Nordland.  Qualification  and  Licensing  of  Computer-Based 
Systems  for  Safety  Tasks  in  German  Light  Water  Reactors.  In  American 
Nuclear  Society  International  Topical  Meeting  on  Computer  Applications  for 
Nuclear  Power  Plant  Operation  and  Control,  pages  326-329,  Richland,  WA, 
September  1985. 
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G,  Gloe  and  G.  Rabe.  Experience  with  Computer  Assessment.  In  Safety  and 
Reliability  of  Programmable  Electronic  Systems,  pages  145-151,  Essex, 
England,  1986.  Elsevier. 

S.G.  Godoy  and  G.J.  Engels,  Software  Sneak  Analysis.  American  Institute  of 
Aeronautics  and  Astronautics  (77-1386),  pages  63-67, 1977. 

J.  Goldberg.  Some  Principles  and  Techniques  for  Designing  Safe  Systems. 
ACM  SIGSOFT  Software  Engineering  Notes,  12(3):17-19,  July  1987. 

D.i.  Good,  Predicting  Computer  Behavior.  In  COMPASS  '88  Computer 
Assurance,  pages  75-83,  Gaithersburg,  MD,  July  1988. 

The  paper  argues  that  the  only  way  to  assure  the  safety  of  a  software  system 
is  to  be  able  to  predict  that  the  system  will  behave  acceptably  in  the  future.  A 
mathematical  model  of  a  computer  system  is  described  and  used  to  derive 
equations  which  enable  arguments  on  the  number  of  states,  requirements  on 
acceptance  tests,  and  the  effectiveness  of  testing  In  general  to  be  derived.  The 
paper  discusses  issues  of  completeness,  magnitude,  instability,  the  definition 
of  acceptable  behavior,  and  approaches  to  the  demonstration  of  acceptable 
behavior.  It  is  argued  that  the  only  way  to  scale  up  to  real  systems  Is  by  use  of 
all  available  mathematics  and  an  approach  is  described.  The  paper  concludes 
with  a  comparison  of  a  vision  of  future  practice  where  prediction  of  the  comput¬ 
er  system  is  possible  against  current  practice  whore  very  little  of  the  necessary 
mathematical  foundation  exists. 

J.  Gorskl,  Design  for  Safety  Using  Temporal  Logic.  In  IFAC  SAFECOMP  86, 
pages  149-155,  Sarlat,  France,  1986. 

J.  Gorski.  Formal  Support  for  Development  of  Safety  Related  Systems.  In  B.K. 
Daniels,  editor,  Achieving  Safety  and  Reliability  with  Computer  Systems,  pages 
14-28,  September  1987. 

R.  Greenberg,  Software  Safety  Using  FTA  Techniques.  In  Safety  and 
Reliability  of  Programmable  Electronic  Systems,  pages  86-95,  Essex,  England, 

1 986,  Elsevier. 

J.G.  Griggs.  A  method  of  software  safety  analysis.  In  Proc.  5th  Int.  System 
Safety  Conf.,  volume  1,  part  1,  pages  lll-D-1  to  lll-D-18,  Denver,  1981, 

G.  Gruman.  Software  Safety  Focus  of  New  British  Standard,  Def  Stan  00-55. 
IEEE  Software,  6(3):95-97,  May  1989. 

G.  Guiho  and  C.  Hennebert,  Sacem  Software  Validation.  In  12th  International 
Conference  on  Software  Engineering,  pages  186-191,  Nice,  France,  March 
1990. 
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[84]  M.D.  Hansen.  Survey  of  Available  Software-Safety  Analysis  Techniques.  In 
Annual  Reliability  and  Maintainability  Symposium,  pages  46-49,  Atlanta,  Ga., 
January  1989, 

[85]  M.D.  Hansen  and  R.L.  Watts.  Software  System  Safety  and  Reliability.  In 
Annual  Reliability  and  Maintainability  Symposium,  pages  214-217,  Los 
Angeles,  Ca.,  1988. 

[86]  L.  Hatchard.  Applying  the  Principles  of  the  HSE  Guidelines  to  Programmable 
Electronic  Systems  in  Safety-Related  Applications.  Safety  &  Reliability , 
8(1):30-36,  spring  1988. 

[87]  D.L.  Hauptmann.  A  Systems  Approach  to  Software  Safety  Analysis.  In  Proc. 
Fifth  International  System  Safety  Conference,  1 981 . 

[88]  K.  Hayman.  An  Anal/sis  of  Ordnance  Software  Using  the  Malpas  Tools.  In  Fifth 
Annual  Conference  on  Computer  Assurance,  pages  86-94,  Gaithersburg,  MD, 
June  1990. 

[89]  Health  and  Safety  Executive.  Programmable  Electronic  Systems  in  Safety- 
Related  Applications,  Volume  1,  An  Introductory  Guide.  Her  Majesty's 
Stationery  Office,  London,  Engiand,  1 987. 

[90]  Health  and  Safety  Executive.  Programmable  Electronic  Systems  in  Safety- 
Related  Application*,  Volume  2,  General  Technical  Guidelines.  Her  Majesty's 
Stationery  Office,  London,  England,  1987. 

[91]  K.A.  Helps.  Some  Verification  Tools  and  Methods  for  Airborne  Safety-Critical 
Software.  Software  Engineering  Journal,  pages  248-253  November  1986. 

[92j  M.F.  Houston.  What  Do  the  Simple  Folk  Do?  Software  Safety  in  the  Cottage 

Industry.  In  COMPASS  '87  Computer  Assurance,  pages  S-20 — S-24, 
Washington,  D.C.,  July  1987. 

Makes  the  case  that  a  major  cause  of  problems  arises  due  to  lack  of  attention 
to  requirements,  planning,  and  ear'y  design.  It  is  suggested  that  it  is  easier  to 
look  at  hazards  in  a  system  rather  than  all  of  the  possible  errors  with  that  sys¬ 
tem.  A  Mef  outline  of  a  method  is  presented.  The  method  includes  preliminary 
hazard  analysis  and  system  hazaid  cross  checks,  both  techniques  being  rela¬ 
tively  simple  and  cheap  to  apply.  The  point  is  made  that  the  format  of  the  re¬ 
quirements  is  re'atively  unimportant,  but  that  the  requirements  must  be  clear 
and  verifiable.  Indeed,  for  each  requirement  there  should  be  a  statement  of 
how  that  requirement  may  be  verified.  The  point  is  made  that  software  Is,  by 
itself,  always  safe,  but  It  is  in  the  context  of  a  system  that  software  may  become 
unsafe.  This  means  that  the  hazard  analysis  originally  undertaken  must  be  car¬ 
ried  through  tho  software  development. 
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[93]  Institute  of  Electrical  Engineers.  Health  and  Safety  Legislation,  and  Consumer 
Legislation:  Guidance  for  the  Engineer:  Professional  Brief.  London,  1 988. 
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termining  the  value  of  the  user  interface.  Software  correctness  is  considered  as 
being  a  combination  of  subsystem  correctness  (the  implementation  of  the  sub¬ 
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given  to  define  risk.  Software  itself  is  safe;  risk  arises  when  software  controls 
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ance  techniques  are  discussed.  Designs  that  include  a  guaranteed  safe  state 
and  those  with  reduced  functionality  are  described. 

[1 29]  N.G.  Leveson.  Building  Safe  Software.  In  COMPASS  '86  Computer  Assurance, 
pages  37-50, 1986. 

[130]  N.G.  Leveson.  Software  Safety.  SEI  Curriculum  Module  SEI-CM-6-1.1 
(Preliminary),  Software  Engineering  Institute,  July  1987. 

[131]  N.G.  Leveson.  What  Is  Software  Safety?  In  COMPASS  '87  Computer 
Assurance,  pages  74-75,  Washington,  D.C.,  July  1987. 

This  position  paper  argues  that  building  perfect  software  is  an  unrealistic  goal; 
however,  software  need  not  be  perfect  to  be  safe.  It  continues  by  discussing 
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opment  changes  presented  in  the  paper  is  not  expected  to  be  complete  or  op¬ 
timal,  but  rather  to  suggest  requirements  on  the  development  activity.  The  pa¬ 
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